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Assistant Commissioner for Patents 
Washington, D.C. 20231 

Dear Sir: 

I, Pia M. Challita-Eid, declare as follows: 
1 . I have a Ph.D. in Microbiology from University of Southern California, did post doctoral 
work at University of California at Los Angeles, and was a faculty member at the University 
of Rochester. I have been practicing in the field of molecular biology for over 1 0 years. At 
Agensys, I am the Group Leader of Gene Discovery. In my position at Agensys, I have 
responsibility for evaluating the levels of expression of various genes in tissues. A copy of 
my curriculum vitae is enclosed as Exhibit A. 



2. Our company, Agensys, is dedicated to discovery of proteins that are highly expressed in 
various tumor tissues as compared to normal tissues. The company approaches this 
discovery task by first identifying cDNAs which correspond to genes overexpressed in tumor 
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discovery task by first identifying cDNAs which correspond to genes overexpressed in tumor 
tissue using the technique of suppression subtractive hybridization (SSH). In this technique, 
cDNA from normal tissues is subtracted from cDNA from tumor tissues. Thereby, cDNA 
present in tumor tissues but not in normal tissues is isolated. Thus, on a gene-by-gene basis, 
this approach can indicate that a gene corresponding to the cDNA is overexpressed in tumor 
cells. 



3. Typically, the next step is to utilize the sequence information obtained from SSH to obtain a 
full-length DNA clone which includes the entire open reading frame for the protein 
corresponding to this cDNA. 

4. In addition, the level of expression of the corresponding gene is determined in various 
normal tissues and in various tumor tissues and tumor cell lines using the technique of 
Northern blotting, which detects production of messenger RNA. It is well known that the 
production of messenger RNA, that encodes the protein, is a necessary step in the production 
of the protein itself. Therefore, detection of high levels of messenger RNA by, for example, 
Northern blot, is a way of determining that the protein itself is produced. 

5. Northern blotting is a detection method of relative levels of mRNA expression of a gene. It 
is procedure in which specific mRNA is measured using a nucleic acid hybridization 
technique. The signal is detected on an autoradiogram. The stronger the signal, the more 
abundant is the mRNA. For genes that produce mRNA that contains an open reading frame 
flanked by a good Kozak translation initiation site and a stop codon, in the majority of cases 
the synthesized mRNA codes for a protein. Kozak translation initiation sites are discussed in 
greater detail paragraph 7, below. 

6. The evidence referred to in paragraphs 3, 4 and 5 above is consistent with the general 
knowledge in the art of molecular biology that, with rare exceptions, expression of a 
polynucleotide is predictive of expression of the corresponding protein. This isjDarticularly 
true for mRNA with an open reading jhmie and a Kozak cpnsensus sequence for fransla tion 
initiation. 
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7. The consensus Kozak initiation site CCACCATGG where the ATG start codon is italicized, 
refers to the "optimum" translation initiation sequence. A study by Peri and Pandey Trends 
in Genetics (2001) 17: 685-687, describes a study of over 1500 translation initiation sites in 
order to address the natural rnRNA translation initiation. This study showed that the most 
authentic initiation sequence has 3 or more mismatches from the optimum consensus Kozak 
sequence CCACC ATG G. The sequence of the translation initiation site of PHELIX, 

- TCAAC ATGG, shows only 2 nucleic acid differences from the optimum Kozak consensus. 
^ Also, the translation initiation site of PHELIX contam^G^p^tionJ-4^hjch has been 
shown to significantly augment translation efficiencjyJKozak (1997) Embo J 16:2482-92). 
\ Altogether, these data demonstrate that the translation initiation site of PHELIX is functional 

\ and can initiate protein translation. 

8. The Northern blot technique is used as a routine procedure (as compared to Western blotting, 
immunoblotting or immunohistochemistry) because it does not require the time delays 
involved in isolating or synthesizing the protein, preparing an immunological composition of 
the protein, eliciting a humoral immune response, harvesting the antibodies, and verifying the 
specificity thereof. All of these things can be done, but they take time, and the presence of 
rnRNA on Northern blots, especially in comparative tissues, is a recognized indication that 
the protein itself will be produced. 

9. I am familiar with the general practice of Northern blotting and interpretation, described 
above, being carried out, not only at Agensys, but also at other companies that seek to 
evaluate gene expression in various tumor and other tissues. The use of Northern blots as a 
means for evaluating protein production is universally accepted as reliable and is therefore 
widely practiced. 

10. It is understood that the absolute levels of messenger RNA present and the amounts of 
protein produced do not always provide a 1 :1 correlation. However, in those instances where 
the Northern blot has shown rnRNA to be present, it is almost always possible, in my 
experience, when the time is taken to do so, to detect the presence of the corresponding 
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protein in the tissue which provided a positive result in the Northern blot. The levels of the 

protein compared to the levels of the mRNA may be disjunctive, but is would be inaccurate 
• i to sav that there is no correlation between protein levels and mRNA levels as a general 
^ matter. In general, cells that exhibit detectable mRNA also exhibiUietectable corresponding 

protein and vice versa. This is particularly true where the mRNA has an open reading frame 

and a good Kozak sequence. 

1 1 . Ironically, studies seeking to determine the overall pattern of correlation between mRNA and 
corresponding protein have started with displaying the protein fingerprint of a particular cell 
or tissue. For instance, an article by Anderson, L. and Seilhamer, J., :^ctrophoresis (1997) 
18:533-537 (Exhibit B) describes such a study on a patient liver. A 2D gel was obtained to 
determine the pattern of proteins in the liver, and a cDNA library was used to determine the 
pattern for mRNA. The authors found that of 23 selected proteins which could be identified 
from the gel, mRNA for 19 were detected in the transcript images. Thus, in the vast majority 
of cases, there was both mRNA and protein present. The authors found that the levels of 

j RNA units to protein units had a correlation coefficient ofU48.>As they state, this number is 
intriguingly close to the middleposition b^eena ■ perfect coirejaji^ 
correlation whatever (0.0> Only a correlation coefficient of (OJ)) >_wpuldsupport a 
proposition that mRNA presence provides no indication of protein presence. The conclusion 
/t>as to be that in the vast majority of instances, i.e., any correlation coefficient other than 

(0 0) where mRNA is present protein is also present. It is inaccurate to say that there is no 

\ ' ' " . ' . . - 

correlation between mRNA expression and protein expression. 

12. An article by Oh, J.M.C., et al, Proteomics (2001) 1:1303-1319 reports a database of 
protein expression in lung cancer. Again, the study sought to determine the correlation 
between mRNA and corresponding protein beginning with protein fingerprint display of a 
particular cell or tissue. Protein expression was evaluated using 2D gels and mRNA 
expression was evaluated using microarrays. The approach is suggested as a tool for 
evaluating, genetically, the correlation between mRNA expression and protein expression. 
Clearly it is expected that the correlation will not be zero or the tool would not even be 
proposed. 
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13. 1 am aware that the Examiner has cited a publication by Fu, L., et al t Embo. Journal (1996) 
15:4392 - 4401 which reports an extremely rare occurrence where there does appear to be 
zero production of any protein even in the presence of mRNA. This is for the specific 

/ protein p53. I am not familiar with any other instances where this occurs. This is an 
exception to the rule that there is at least some correlation between mRNA presence and 
protein production. This is supported by the publication itself; were this not an unusual 
occurrence, this lack of correlation would not merit publication at all. 

14. In many cases, a reported lack of protein expression is due to technical limitations of the 
protein detection assay. For instance, the available antibody may only detect denatured 
protein but not native protein present in a cell. In other instances, the half-life of the protein 
is very short, thereby the steady-state protein levels are below detectable range. Short-lived 
proteins are still functional, and some have been previously described to induce tumor 
formation as shown in the article by Reinstein et al Oncogene 19: 5944-50. In such 
situations, when more sensitive detection techniques are performed and/or other antibodies 
are generated, protein expression is detected. When studies fail to take these principles into 
account, they are likely to report artifactually lowered correlations of mRNA to protein. 



15. A previous declaration has been submitted in this case to demonstrate that, at least in 293 
/ cells, it is possible to produce the protein encoded by the PHELIX gene. As described in 
/ Dr. Hubert's declaration, this has been verified by producing antibodies raised jigainst a 1 5- 
mer peptide designed from the PHELIX coding region. This demonstrates that in 293 cells, 
\ there is no translational inhibition to the production of protein. 



16. The production of protein in the 293 cells shows conclusively that for those tumor cells, and 
by analogy for tumor cell lines where mRNA is also shown to be present, the PHELIX 
protein is present as well. The reason I conclude this is that in this experiment, when 
PHELIX mRNA was made7PHELIX protein was also produced and detected. This shows 
that the PHELIX mRNA is stable, functional and codes for a protein. And that the 
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translation initiation and termination sites of PHELDC are functional sites and lead to the 
production of a detectable PHELIX protein. 

17. Most genes, when they produce mRNA that contains an open-reading frame flanked by a 
good Kozak translation initiation site and a stop codon, the synthesized mRNA code for a 
protein. Analysis of PHELIX shows a strong mRNA signal on Northern blot in cancer 
tissues, and the mRNA sequence shows an open-reading frame containing a good Kozak 
initiation site and a stop codon. Therefore, production of PHELIX protein is reasonably 
predicted based on this data. 

18. In summary, the scientific community regards the presence of mRNA in cells is indicative of 
the production of protein. This is particularly true when the Northern data is strong and the 
mRNA has an open reading frame and a good Kozak sequence. It is understood that the 
correlation of mRNA and protein levels is not perfect, however, instances such as those in 
Fu, where protein is absent although mRNA is present at high levels, are a rare exception. 

19. The use of positive Northern blots as indicative of and predictive of protein production is a 
recognized conclusion of scientists in this field. 

20. 1 declare that all statements made herein of my own knowledge are true and that all 
statements made on information and belief are believed to be true; and further, that these 
statements are made with the knowledge that willful, false statements and the like so made 
are punishable by fine or imprisonment or both, under Section 1001 of Title 18 of the United 
States Code and that such willful false statements may jeopardize the validity of the 
application or any patent issued thereon. 



2002. 



Executed at ^LAcL. U^rco- on Jqm Jo _, 



Pia M. Challita-Eid 
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Curriculum Vitae 
PIA M. Challita-Eid, Ph.D 



Personal information 

Work address: 

Email: 

Home address: 



Agensys, Inc. 

1545 17 th Street 

Santa Monica, CA 90404 

pchallita@agensys.com 

15745 Morrison Street 
Encino, CA 91436 



Appointments: 



Group Leader, 
Research Scientist III 
Gene Discovery 

Research Scientist II 



Assistant Professor 
in Medicine, 
Microbiology & 
Immunology 

Senior Instructor 



Agensys, Inc. 
October 2001-Present 

Agensys, Inc. 
August 2000-Present 

University of Rochester 
Cancer Center 
Hematology/ Oncology Unit 
July 1998- June 2000 

University of Rochester 
Cancer Center 
Department of Oncology 
January 1996- June 1998 



Education: 



B.S. Biology 



M.S. Microbiology 



American University of Beirut-Lebanon 
1984-1987 

American University of Beirut-Lebanon 
1987-1989 
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University of Southern California 
Department of Microbiology 
January 1990 - June 1994 

Donald B. Kohn, M.D., Associate, Professor 
Departments of Pediatrics and Microbiology 
Division of Research Immunology and Bone 

Marrow Transplantation 
Childrens Hospital of Los Angeles 
University of Southern California, California 
USA 

Postdoctoral fellowship University of California Los Angeles 

Department of Hematology-Oncology 
September 1994 - December 1995 

Advisor: Joseph D. Rosenblatt, M.D., Assistant Professor 

School of Medicine 

Department of Hematology-Oncology 
University of California, Los Angeles, California 



Students and Research Associates Mentored: 

Currently leading the Gene Discovery group of 6 research associates. Previous students and research 
associates mentored are listed below. 

1. Skelton Diane, Research Associate, 1992-1994. 

2. El-Khoueiry Anthony, Undergraduate student, Summer 1992 and 1993. Currently Fellow at 
the USC Medical Center. 

3. Poles Tina, Research Associate, 1996-1998. 

4. Mosammaparast Nima, Undergraduate student, June 1996 - September 1997. Currently 
enrolled in Medical School. 

5. Zoric Bojan, Undergraduate student, June 1997-June 1998." Currently enrolled in Medical 

School. 

6. Rimel BJ, Research Associate, June 1998-June 1999. 

7. Vicki Houseknecht, Research Associate, June 1999 - June 2000. 

8. Facciponte John, Graduate student in the Microbiology and Immunology Department at the 
University of Rochester, January 1998 - June 2000. Currently a graduate student at Roswell 
Park Cancer Center, Buffalo, NY. 

9. Kyung Yi, Graduate Student in Microbiology, January 1999 - June 2000. 

10. Anagha Joshi, Post-doctoral fellow, October 1999 - June 2000. 



Ph.D. Microbiology 



Advisor: 



Patents: 

In the last year, I have been involved in the filing of greater than 40 applications. 



1) "Retroviral Vectors for Expression in Embryonic Cells", US5707865, issued date Jan. 13, 1998. 

2) "Chimeric Proteins for the Stimulation of a Tumor-Specific Immune Response", application 

in progress. 

Invited Presentations: 

October 1994 "Retroviral Vector Expression in Murine Stem Cells". Department of 

Hematology-Oncology, UCLA Gene Therapy Program, Los Angeles, California. 

October 1997 "Antibody Fusion Proteins for the Specific Recruitment and Activation of an 
Anti-Tumor immune Response". Childrens Hospital of Los Angeles, Los 
Angeles, California. 

February 1998 Regional Cancer Center Consortium for Biological Therapy. Roswell Park 
Cancer Institute, Buffalo, New York. 

July 1998 American Cyanamid Company. Lederle-Praxis Biologicals Division, Rochester, 
New York. 

October 1999 "Monoclonal Antibody Technology in the Era of Genetic Engineering" Brazilian 
Meeting on Biosafety and Transgenic Products, Rio De Janeiro, Brazil. 

June 1999 "Breast Cancer Research in the Era of Genetic Engineering", Breast Cancer 
Coalition of Rochester, Rochester, NY. 

Awards: 

Graduate Student Research Forum Award. Silencing of retroviral vectors after transduction of 
hematopoietic stem cells is associated with methylation. Graduate Student Research Forum 
Poster Session. USC Medical School, Los Angeles, California, 1993. 

Presidential Award. Society of Biological Therapy, Pasadena, California, October 1997. 

Merit Award. American Society of Clinical Oncology, California, May 1998. 

Grants/Funds: 

1) Jonsson Cancer Center Foundation/ UCLA 
Fellowship Seed Grant 

Title: "Antigen Processing in Human Neural Crest Tumors" 
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•Effective Dates: 11/1/95-10/31/96 
Amount: $27,707 



2) Rochester Area Foundation 

Lucille B. Kesel Fund for the Advancenent of Cancer Research 

Title: "Antibody Fusion Proteins for Eradication of Minimal Residual Disease" 

Effective Dates: 1/1/98-12/31/98 

Amount: $8,000 

3) University of Rochester Cancer Center 
Interim and Pilot Project Funding 
P.I.: Joseph D. Rosenblatt, M.D. 
Co-P.L: Pia M. Challita-Eid, Ph.D. 

Title: " Antibody Fusion Proteins for the Therapy of Cancer" . 
Effective Dates: 1/1/98-12/31/98 
Amount: $25,000 

4) Sinsheimer Scholar Award 

Title: "Genetically-Engineered Chemokine Antibody Fusion Proteins for Breast and 

Ovarian Cancer Therapy" 
Effective Dates: 7/1/98-6/30/01 
Amount: $40,000/ year 

5) NIH/NCI 

P.I.: Joseph D. Rosenblatt, M.D. 
Co-P.L: Pia M. Challita-Eid, Ph.D. 

Title: "Recruitment and Activation of an Anti-rumor Response using Antibody-Fusion 
Proteins" 

Effective Dates: 12/1/98-11/30/03 
Amount: $191,046/ year 

6) NIH/NCI - Rapid Access to Intervention Development (RAID) 

Title: "Preclinical Development of a B7.1 Anti-HER2/neu Antibody Fusion Protein" 
Effective Date: Approved April, 1999 
Amount: Not applicable 

7) ACS Institutional grant 

Title: "Chemokine Directed Targeting of Cytotoxic TALL-104 Cells" 
Effective Dates: 9/1/99-8/30/00 
Amount: $8,000 

8) Breast Cancer Coalition of Rochester 
Title: "Breast Cancer Research" 
Date: 9/99 

Amount: $1,000 
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Gersuk GM, Westermark B, Mohabeer AJ, Challita PM, Pattamakom S, and Pattengale, PK. 

Inhibition of human natural killer cell activity by -platelet-derived growth factor (PDGF). 
III. Membrane binding studies and differential biological effects of recombinant PDGF 
isoforms. Scand J Immunol 33: 521-532, 1991. 

Gersuk GM, Carmel R, Challita PM, Rabinowitz AP, and Pattengale PK. Quantitative and 
functional studies of impaired natural killer (NK) cells in patients with myelofibrosis, 
essential thrombocytopenis, and polycythemia vera. I. A potential role for platelet- 
derived growth factor in defective NK cytotoxicity. Nat Immun 12: 136-151, 1993. 

Challita PM, and Kohn DB. Lack of expression from a retroviral vector in murine 

hematopoietic stem cells is associated with methylation in vivo. Proc Natl Acad Sci 
(USA) 91: 2567-2571, 1994. 

Krall W, Challita PM, Perlmutter L, Skelton D, and Kohn DB. Cells expressing human 

glucocerebrosidase from a retroviral vector repopulate macrophages and central nervous 
system microglia after murine bone marrow transplantation. Blood 83: 2737-2748, 1994. 

Challita PM, Skelton D, Yu XJ, El-Khoueiry A, Yu X-J, Weinberg KI, and Kohn DB. 
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cells. J Virol 69: 748, 1995. 

Ucar K, Seeger RC, Challita PM,Watanabe CT, Yen TL, Morgan JP, Amado R, Chou E, 
McCallister T, Barber JR, Jolly DJ, Reynolds P, Gangavalli R, and Rosenblatt JD. 
Sustained cytokine production and immunophenotypic changes in human 
neuroblastoma cell lines transduced with a human gamma interferon vector. Cancer Gene 
Therapy 2: 171,1995. 

Lu Y, Planelles V, Palaniappan C, Li X, Challita-Eid PM, Amado R, Stephens D, Kohn DB, 

Bakker A, Day B, Bambara RA, and Rosenblatt JD. Inhibition of HIV-1 replication using 
a mutated tRNALys3 primer. / Biol Chem 272:. 14523, 1997. 

Challita-Eid PM, Penichet ML, Shin SU, Poles T, Mosammaparast N, Mahmood K, Slamon DJ, 
Morrison SL, and Rosenblatt JD. A B7.1-antibody fusion protein retains antibody 
specificity and ability to activated the T cell costimulatory pathway. / Immunol 160: 
3419-3426,1998. 

Challita-Eid PM, Abboud CN, Morrison SL, Penichet ML, Rosell KE, Poles T, Hilchey SP, 
Planelles V, and Rosenblatt JD. A RANTES- antibody fusion protein retains antigen 
specificity and chemokine function. / Immunology 161: 3729, 1998. 



5 



Challita-Eid PM, Rosenblatt JD, Day B, Rimel BJ and Planelles V. Inhibition of HIV-1 infection 
with a RANTES.IgG3 fusion protein. AIDS Research and Human Retroviruses 14:1617, 
1998. 

Mahmood K, Federoff HJ, Challita-Eid PM, Day B, Haltman M, Atkinson M, Planelles V, and 
Rosenblatt JD. Eradication of pre-established lymphoma using HSV amplicon vectors. 
Blood 93: 643, 1999 

Penichet ML, Challita PM, Shin S-U, Sampogna S, Rosenblatt JD, and Morrison SL. In vivo 

properties of three human HER2/neu-expressing murine cell lines in immunocompetent 
mice. Laboratory Animal Science 49: 179-88, 1999. 

Penichet ML, Dela Cruz JS, Challita-Eid PM, Rosenblatt JD, Morrison SL. A murine B cell 

lymphoma expressing human HER2/ neu undergoes spontaneous tumor regression and 
elicits antitumor immunity. Cancer Immunol Immunother 49:649-62, 2001. 

Hilchey SP, Rosebrough SF, Morrison SL, Rosenblatt JD, and Challita-Eid PM. Specific 
targeting and stimulation of in vivo anti-tumor response using a B7.1 T-cell 
costimulatory antibody fusion protein. Manuscript in preparation. 



Select Abstracts and Presentations: 

Challita PM, El-Khoueiry AB, and Kohn DB. Silencing of retroviral vectors after transduction 
of murine hematopoietic stem cells is associated with methylation. Blood 80 (10 Suppl. 1): 
168a, 1992. 

Challita PM, Cook C, Sender LS, and Kohn DB. Novel retroviral vectors for consistent 

expression after transduction into hematopoietic stem cells. Keystone Symposium 
on Gene Therapy, Keystone, Colorado, 1993. 

Challita PM. Retroviral vector expression in murine stem cells. Presentation. Division 
of Hematology-Oncology, University of California Los Angeles, October, 1994. 

Challita PM, Shin S-U, Penichet M, Mahmood K, Poles TM, Resell KE, Abboud CN, Morrison 
SL, Rosenblatt JD. Novel Antibody Fusion Proteins for the Stimulation of a Tumor- 
Specific Immune Response. Keystone Symposium on Cellular Immunology and 
Immunotherapy of Cancer, Copper Mountain, Colorado, January 1997. 

Penichet ML, Challita PM, Shin S-U, Slamon DJ, Rosenblatt JD, and Morrison SL. In vivo 

properties of two human her2/neu expressing murine cell lines in immunocompetent 
mice. Mutlidisciplinary Approaches to Cancer Immunotherapy, Bethesda, Maryland, 
June 1997. 
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Rosenblatt JD. Characterization of a RANTES-antibody fusion protein for cancer 
immunotherapy. Mutlidisciplinary Approaches to Cancer Immunotherapy, Bethesda, 
Maryland, June 1997. 

Horwitz S, Rosenblatt JD, Mosammaparast N, Poles T, Abboud CN, and Challita PM. Gene- 
modified EL4 cells expressing the chemokine RANTES protects from tumor growth and 
stimulates an anti-tumor cytotoxic T-lymphocyte response in vivo. Mutlidisciplinary 
Approaches to Cancer Immunotherapy, Bethesda, Maryland, June 1997. 

Challita-Eid PM, Morrison SL, Penichet ML, Rosenblatt JD. Antibody-T cell costimulatory 
ligand fusion protein for the stimulation of a specific anti-tumor immune response. 
American Society of Hematology, San Diego, California, December 1997. 

Challita-Eid PM, Abboud CN, Penichet ML, Rosell KE, Morrison SL, Rosenblatt JD. Antibody 
fusion proteins for the recruitment and activation of an anti-tumor immune response. 
American Association for Cancer Research. New Orleans, Louisiana, March 1998. 
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fusion protein induces effector cell infiltration to the site of HER2/neu expressing 
tumors. AACR/NCI/EORTC Molecular Targets and Cancer Therapeutics, Washington 
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Cancer, Santa Fe, New Mexico, January 2000. 



7 



Research Update 



TRENDS in Genetics Vol.17 No.1 2 December 2001 



685 



A reassessment of the translation initiation codon in 
vertebrates 

Suraj Peri and Akhilesh Pandey 



More than two decades ago Marilyn Kozak 
proposed the scanning model of translation 
initiation, whereby translation is initiated at 
the first AUG codon that is in a particular 
context In this article, we re-examine the 
context of initiator codons using a large 
dataset of cu rated human transcripts. We 
find that more than 40% of transcripts 
contain AUG codons upstream of the actual 
start codon and thatroc^auUienticAfc*6s» 
co ntatwtttt e^ccmore mismatches from the 
consensus sequence, CCACCaugG. Also, 
in a large fraction of transcripts, the 
sequences surrounding the initiator codon 
deviate more from the consensus than 
those surrounding upstream AUGs, 
indicating that translation initiation from 
downstream AUGs is more common than 
generally believed. 

In this article, we re-examine the position 
requirement and the context dependence 
for an AU G codon to be used for 
translation initiation as proposed by 
Kozak 1 . The evidence in favor of the f irst- 
AUG rule is mainly derived from the fact 
that the first AUG was used in about 
90-95% of cases in a study of several 
hundred vertebrate mRNAs (Ref. 2). The 
context refers to the nucleotide sequence 
surrounding the AUG (generally 
accepted as being CCACCaugG) 3 - 4 . This 
'consensus' has been derived from 
observing the sequences surrounding 
the first AUG in exons as well as by 
mutagenesis studies. Although these 
mutagenesis studies provide data on 
what might constitute an 'optimal' 
sequence, they do not accurately predict 
whether a given AUG found in a natural 
mRN A transcript is likely to encode the 
initiator methionine. 

As a result of the complete sequencing 
of the human genome, tens of thousands 
of novel genes have been identified, and 
their translation initiation sites need to be 
predicted 5 - 6 . Therefore, we decided to take 
another look at two of the major issues 
that underlie the scanning model of 
translation initiation. For our analysis, 
we have worked exclusively with well- 
character ized and annotated human 
mRN A sequences from the reviewed 



Ref Seq dataset 7 . Ref Seq is a project by 
the National Center for Biotechnology 
Information (NCBI) to create curated 
non redundant entries for each gene and 
contains predicted, provisional and 
reviewed entries for mRNAs and proteins. 
Reviewed entries are those where a 
human curator has performed an 
extensive manual verification, and by 
using these entries, we hope to avoid the 
annotation errors that otherwise abound 
in databases 8 . 

It has been proposed that certain 
classes of molecules, such as oncogenes, 
growth factors and receptors, are 
translated poorly and could contain a 
higher frequency of upstream AUG 
codons in their mRNAs as a mode of 
regulation 2 . We therefore subdivided 
our dataset into two classes: transcripts 
encoding cytosolic molecules, and those 
encoding proteins that are secreted or 
bound to the plasma membrane (i.e. those 
products with signal peptides). We first 
examined the nucleotide sequences 
surrounding their established initiator 
codons. Second, we inspected whether 
any AUGs exist upstream of the actual 
initiator codon in the 5' untranslated 
regions (UTR) of these mRNAs. Last, 
because we found a significant number 
of transcripts with an upstream AUG, 
we examined whether there is any 
relationship between the size of the 
reported upstream 5' UTR and the 
number of unused upstream AUGs. 

Initiator codons in transcripts encoding 
cytosolic proteins 

We studied the sequence context of 
AUGs from a dataset of 1 534 reviewed 



transcripts encoding cytoplasmic 
proteins; the observed frequencies are 
shown in Table 1 . When considered 
individually, the nucleotides that form 
the consensus CCACCaugG are found in 
32-53% of transcripts. When only -3 and 
+4 positions are examined, only 46% of 
transcripts contain a purine (A or G) at 
-3 and a G at +4. Thus, over half of the 
transcripts differ from what are believed 
to be the most conserved nucleotide 
positions (-3 and +4) surrounding the 
AUG. \Afe did not find that specific 
nucleotides occurred at position -5 with 
significantly higher frequency than 
random (P-value< 0.05). The degree of 
deviation of individual sequences from 
the consensus wasalso calculated and is 
discussed below. 

Initiator codons in transcripts encoding 
cytokines, growth factors and receptors 
The assignment of an AUG as an initiator 
methionine on the basis of genom ic 
sequences can be quite contentious. 
Even when a protein sequence derived 
from a cloned cDN A is used, there can 
be disagreements on several issues. 
Therefore, we have taken a biological 
approach. Signal peptides are found at 
the amino termini of secreted factors such 
as cytokines and growth factors, as well 
as of type I transmembrane receptors 
that have their amino terminus located 
extracellularly 9 - 10 . These peptides are 
approximately 1 5-30 amino acids long 
and contain a stretch of hydrophobic 
amino acids. Several excellent programs 
are availablethat predict both the 
presence of signal peptide and the 
cleavage site 1112 . Because signal peptides 



Table 1. Frequency of nucleotides surrounding the initiator codon of transcripts encoding 



cytoplasmic proteins* 


-5 


-4 


-3 


-2 


-1 


♦1(A) 


+2(T) 


+3(G) 


+4 


A 17.07 


21.77 


47.00 


30.37 


20.99 


100 


0 


0 


19.36 


T 19.29 


10.16 


5.60 


10.56 


6.12 


0 


100 


0 


13.36 


G 31.42 


28.61 


37.15 


18.57 


27.64 


0 


0 


100 


4138- 


C 32.20 


39:43 


10.23 


40.48* 


45,24 


0 


0 


0 


13.88 



•Sequences surrounding the initiator codon or manuauy reviews - ~ * ' _ 

prt^ns. The frequency of occurence of indicated nucleotides pounding the initiator codon at pos.dons -5 to 
+4 with respect to ATG is shown. 
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Fig. 1. Analysis of sequence contexts surrounding initiator codons and upstream unused AUGs. (a) Mismatch 
frequency of the nucleotides surrounding the initiator codon observed in natural transcripts as compared with the 
Kozaks consensus (CCACCaugG).Thisdataset is composed of 1534 transcripts encoding cytoplasmic prote.ns. 
The random probability of occurrence of no. one, two. three, four, five or six mismatches is 0.02%. 0.4%. 3.3%. 
13.2%, 29.7%. 35.6% and 17.8%. respectively, (b) Comparison of the contexts of upstream unused AUGs with the 
authentic AUG, when CC ACCaugG was considered as the optimal context (from a total of 2195 Ref Seq transcripts) . 
Yellow, the percentage of transcripts that contained at least one upstream AUG with fewer mismatches than the 
authentic AUG (i.e. the upstream AUG is in a more favorable context); green, the percentage of transcnpts where 
all upstream AUGs were in a similar context (indicated by the same degree of mismatch from the consensus); 
orange, all other transcripts, (c) Comparison of the contexts of upstrea m unused AUGs with the authentic AUG 
when a purine (A or G) at -3 and a Gat +4 was considered as the optimal context (from a total of 2195 RefSeq 
transcri pts) . The color code is the same as in (b) . 



can be recognized easily in these classes 
of proteins, assignment of the actual 
initiator methionine is obvious. 
Additionally, thecDNAsof most of these 
genes have been expressed in cells, 
va I idating the assignment of the signal 
peptides functionally. 

We therefore compiled a listof 
6S1 cytokines, growth factors and 
receptors, and then tabulated the 
nucleotide sequences surrounding the 
initiator methionine residue from 
transcripts encoding these proteins. As 
in the case above, only 41 .8% of the 
transcripts contained a purine at -3 
and a G at +4 (data not shown). The 
observations from this dataset essentially 
paralleled the results shown in Table 1, 
indicating that cytokines and growth 
factors do not contain any atypical motifs. 

Frequency of initiator codons is not in 
agreement with the theoretical consensus 
XCACCaugG' 

We next decided to investigate how often 
a real initiator methionine from our 
dataset is in agreement with the 
consensus and to express any deviation 
from the consensus as the number of 
mismatches observed. If the surrounding 
sequences were almost or entirely 
identical to Kozak's consensus, then one 
would expect to find most proteins with no 
or a single mismatch. However, if they 
were more randomly distributed, then the 



number of mismatches would be around 
three or four (because having exactly five 
or six mismatches is more constrained in 
terms of probability). Interestingly, we 
found that only^4^ef trar bu ipls 
encoding cytosol ic proteins had two or less 
mismatches comparedwith the consensus 
(Fig. 1a). This implies thata majority of 
transcripts contain initiator codons that 
are not in close agreement with Kozak's 
consensus sequence. The same 
phenomenon was seen when proteins 
with signa I peptides were considered 
(data not shown). 

Frequency of upstream AUGs 
To determine how often the most 5' AUG 
is used, we decided to inspect the 
transcripts for the presence of AUGs 
that were upstream of the initiator 
methionine. Here, we expected there 
would be no upstream AUGs in most of 
these cases. However, aga in to our 
surprise, we found that on ly si ightly more 
than half of the transcripts contained no 
upstream AUG. In feet. 41% of transcripts 
had one or moce r and 24% of genes had 
two or more upstrearrrAUGs (data not 
shown). This means that, whatever the 
reason, the second, third or a further 
downstream AUG is chosen for 
translation initiation in these cases. Of 
course, if one were to assign the first AUG 
as the initiator methionine in these 
transcripts, the predicted open reading 



frame (ORF) and the length of the 
encoded protein would be erroneous. The 
lack of any significant difference in the 
distribution of cytosolic proteins and 
those with signal peptides indicates that 
the class of proteins coded for by the 
mRNAs is not a reliable indicator of 
atypical behavior of mRNAs. 

It has been argued that it is the first 
AUG with a favorable context that is 
used for translation initiation. Therefore, 
we decided to compare the contexts of 
upstream unused AUGs with that of the 
authentic AUG in two ways. In the first 
method, we calculated the degree of 
mismatch for each of the upstream AUGs 
from the consensus, CCACCaugG, and 
compared it with the degree of mismatch 
of the authentic AUG. We divided the 
transcripts into three groups based on 
these results: (1) those that contained 
at least one upstream AUG in a more 
favorable context; (2) those where each 
of the upstream AUGs had a simi lar 
context; and (3) those where either all the 
upstream AUGs were in a less favorable 
context or some were in a less favorable 
context and others a simi lar context. 
In the second method, we divided the 
transcripts according to the degree of 
mismatch from a motif in which only 
two positions were considered as 
optimal; that is, a purine at -3 and a 
G at +4. The transcripts were aga in 
divided into three groups as in the first 
method. The object of this comparison 
was to identify the number of cases in 
which either a more favorable or similar 
AUG codon existed upstream of the 
authentic AUG. These two groups 
(categories 1 and 2, above) would 
therefore represent transcripts in 
which the first AUG that had a 
favorable context was not chosen for 
translation initiation. 

The results according to the f irst 
method show that 35% of transcripts 
contained at least one upstream AUG 
that was in a more favorable context 
than the actual initiator, with 1 2% 
of transcripts containing upstream 
AUGs in a similar context to the 
authentic AUG (Fig. 1b). Figure 1c shows 
essentially similar results when only a 
purine at -3 and a guanine at +4 were 
considered as the optimal motif 
(according to the second method). Our 
analysis therefore reveals that in almost 
half the cases, there was at least one 
upstream unused AUG codon that was in 
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Fig. 2. The length of 5' untranslated region (UTR) in 
base pa trs plotted against the number of transcripts 
having no, one or t two AUGs upstream of the actual 
translation start cod on. 2195 RefSeq transcripts were 
analyzed. No significant difference was found between 
da tasets of transcripts encoding cytoplasmic proteins 
and those with signal peptides. 

a similar or better son text than the 
authentic AUG. 

The length of the 5' untranslated region 
is related to the number of upstream 
unused AUGs 

Whi le we were performing this analysis, we 
were intrigued by the fact that if the 5' UTR 
was long, the transcripts invariably 
contained upstream AUGs that were not 
used. We therefore decided to investigate 
this systematical ry. Figure 2 shows a 
histogram of the number of transcripts 
with no, one and > two upstream AUGs 
plotted against the length of the 5' UTR. 
Most of the transcripts (85%) with 100 bp 
or less of 5' UTR sequence do not contain 
any upstream AUGs, and only 2.6% contain 
two or more upstream AUGs. Quite the 
opposite is observed for transcripts with 
5' UTRs longer than 300 bp. In this case, 
70% of the transcripts conta in two or more 
upstream AUGs, and only 1 3.5% contained 
no upstream AU Gs. As the length of the 
5' UTR increases, the number of 
transcripts with no upstream AUGs 
decreases and the number of transcripts 
with unused upstream AUGs increases. We 
see the same phenomenon in both classes 
of our dataset, indicating that it is not 
dependent on the type of protein being 
studied. Considering that the average 
length of the 5' UTR for genes in the human 
genome is 300 bp (Ref. 6), pur data suggests 
that one has to be quite careful with the 
Tirst AUG' rule because it is probable that 
the first AUG is not being used in a 
significant number of transcripts. 



Conclusions 

Our analysis essentia lly focused on 
testing whether there is any consensus 
around the initiator codon in transcripts 
encoding known proteins. Transcripts 
that encode well-studied proteins provide 
a more applicable dataset, as these 
proteins are not predictions and have 
been worked on by scores of investigators 
worldwide. They are also good candidates 
to test the predictive value of Kozak's 
criteria when considering assignment of 
a given AUG in the transcript of a newly 
discovered gene. Our analysis shows that 
a large number of transcripts contain 
AUGs upstream of the actual translation 
start site, many of which are in a more 
favorable context than the codon used 
for translation initiation. Furthermore, 
our data shows that most of the AUGs 
used for translation initiation deviate 
significantly from Kozak's consensus. It 
is possible that mechanisms such as 
leaky scanning, reinitiation or internal 
initiation of translation have a much 
greater role than previously 
imagined 13 " 16 . In support of this idea, 
a growing number of transcripts have 
recently been reported to undergo 
internal initiation 17 " 19 . For the purposes 
of gene prediction and identification of 
translation start sites from genomic DNA 
or cDNA sequences, it is better to use 
homology-based alignments across 
protein families or across species to 
identify the initiator codon correctly, 
instead of relying solely on the most 
upstream AUG and its context. 
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Abstract □ 

A primer extension (toeprinting) assay was used to monitor selection by ribosomes of the first versus the second AUG 
codon as a function of introducing mutations on the 3 side (positions +4, +5 and +6) of the first AUG codon. Six different 
flanking codons starting with G (GCG, GCU, GCC, GCA, GAU and GGA) strongly augmented selection of AUG#1 when 
compared with matched mRNAs that had A or C instead of G in position +4. Augmentation by G in position +4 failed 
only when it was combined with U in position +5, as in the sequence augGUA. In contrast with the usual enhancing effect 
of introducing G in position +4, most mutations in position +5 had no discernible effect, as shown with the series augANA 
(where N - C, A, G or U) and the series augCNA. AUG codon recognition was also unaffected by mutations in position 
+6, as shown by testing four mRNAs that had augCCN as the start site. Thus the primary sequence context that augments 
the recognition of AUG start codons does not appear generally to extend beyond G in position +4. When the toeprinting 
assay was used with mRNAs that initiate translation at CUG instead of AUG, cugGAU was not recognized better than 
cugGGU, contradicting the hypothesis that initiation at non-AUG codons might be favored by A instead of G in position 
+5. 

Keywords: initiation codon context/mRNA structure/protein syrimesis/scaririmg model/translation 



Introduction D 

Eukaryotic ribosomes appear to select the start site for translation by a scanning mechanism. The working hypothesis is 
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that the small, 40S ribosomal subunit, carrying Met-tRNAj 17161 and various initiation factors, engages the mRNA at the 

capped 5 end and migrates linearly until it encounters the first AUG codon. At the AUG codon, which is recognized by 
base pairing with the anticodon in Met-tRNA| met , the 40S ribosomal subunit stops, the 60S subunit joins and the 80S 

ribosome is poised to start protein synthesis. Evidence for this scanning mechanism and for the corollary first-AUG rule is 
summarized elsewhere (Kozak, 1989a®, 1992®, 1995®). 

In higher eukaryotes, sequences flanking the AUG codon modulate its ability to halt the scanning 40S ribosomal subunit. 
One of the modulating elements is the GCCACC motif in positions -6 to - 1 , immediately preceding the AUG codon 
(Kozak, 1987®). Mutations that weaken adherence to this consensus motif, especially mutations that substitute a 
pyrimidine for the A in position -3, cause some 40S ribosomal subunits to bypass the first AUG and to initiate instead at 
the next AUG downstream (Kozak, 1986a®, 1989b®; Lin et al y 1993®; Ossipow et al y 1993®). This context-dependent 
'leaky scanning 1 has also been seen when the highly conserved G in position +4, immediately following the AUG codon, is 
mutated (Kozak, 1986a®, 1989b®). Deviations in one or both of these key positions, and the resulting leaky scanning, 
seem to account for the ability of certain mRNAs to produce two proteins by initiating translation from the first and 
second AUG codons (Kozak, 1986b®, 1991®). 

Two recent studies have raised the possibility that context effects on initiation might extend into the coding domain 
beyond position +4. In one case, initiation at GUG appeared to be more efficient when the second codon was GAU 
instead of GUA (Boeck and Kolakofsky, 1994®). A companion study by Griinert and Jackson (1994)® reported similarly 
that initiation at an AUG or CUG start codon was favored by A in position +5 and U in position +6. 

However, documenting the involvement of these or other nucleotides on the 3* side of the initiator codon might be 
complicated by the fact that mutations introduced in these positions of the mRNA may change the amino acid sequence of 
the encoded polypeptide. This could be a problem because the identity of the amino acid adjacent to the N-terminal 
methionine can affect post-translational modifications which, in rum, can affect protein turnover. To circumvent possible 
complications from post-translational events, an assay that directly monitors ribosome-mRNA initiation complexes was 
used in the present study to reinvestigate the question of whether nucleotides in positions +4, +5 and +6 affect the 
recognition of initiator codons. 

Correct definition of the context requirements for initiation is important for predicting translational start sites, which is an 
important aspect of interpreting cDNA sequences (Kozak, 1996®). 



Results □ 

Preliminary test of mutations in positions +4, +5 and +6 

The mRNAs used for these experiments have two start codons and two open reading frames (ORFs), as outlined in Figure 
1. ORF1, which extends from AUG#1 to a UAA codon overlapping Leu45 in the chloramphenicol acetyltransferase 
(CAT) coding sequence, encodes a 70 amino acid polypeptide with a molecular mass of 8 kDa. This polypeptide is 
designated p8 0ut (meaning out-of-frame with respect to CAT) or simply p8. ORF2 initiates with AUG#2, which is in- 
frame with the downstream CAT coding sequence. ORF2 thus encodes a 240 amino acid polypeptide (the 219 amino acid 
CAT protein with a 21 amino acid N-terminal extension), with a molecular mass of 28 kDa. The product of ORP2 is 
designated p28 precat or simply p28. Because the sequence preceding AUG#1 includes U in position -3, which is 
suboptimal, leaky scanning should allow these mRNAs to produce both polypeptides: p8 from AUG#1 and p28 from 
AUG#2. The control mRNA in Figure 2 A (lane 9) illustrates how this leaky scanning can be modulated by changes in 
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context. Because AUG#1 in the control has the optimal A in position -3, this mRNA produces a much higher yield of p8, 

and a much lower yield of p28, than any other mRNA in this series. This fits with previous studies of mutations involving 

sequences on the 5 side of the AUG co don (Koz ak, 1986al±j, 1989bB). 

Fig. 1. Sequences of mRNAs used to study the effects of varying the nucleotide in 

positions +4, +5 and +6. The sequence in the top line is common to all mRNAs in 

this series. The 3 end of this sequence indicated by an ellipsis (. . .) leads to the 

CAT coding domain (Kozak, 1989b®). Not shown is the m7G cap at the 5 end of 

all mRNAs. Mutations in the indicated positions of particular mRNAs were 

introduced around the first AUG codon, which initiates translation of an 8 kDa 

polypeptide (p8). In a different, overlapping reading frame, the second AUG codon initiates translation of a 28 kDa 

polypeptide (p28) which is an N-terminally extended version of CAT. Because of the suboptimal context preceding 

AUG#1 (notably the presence of U rather than A in position -3), some ribosomes would be expected to reach AUG#2 by 

leaky scanning. Thus, each of the 12 test constructs should direct translation of both p8 and p28: The improved context (A 

in position -3) in the control p-Con-1 should strongly shift translation in favor of p8. With the other control, p-Con-0, p28 

should be the sole translation product because the upstream AUG codon is absent. Notice that mRNAs are named by 

stating the three bases following the first AUG codon. 
[View Larger Version of this Image f 1 3KGIF file)] 

Fig. 2. Translation of mRNAs that vary in positions +4, +5 and +6 flanking the 
first AUG codon. The autoradiograms show [ 3 H]leucine-labeled proteins produced 
in a rabbit reticulocyte translation system using mRNAs that have point mutations 
in positions +4 and +5 (A) or position +6 (B). The mutations identified above each 
lane were introduced around AUG#1, which initiates translation of p8. p28 results 
from initiation at the invariant AUG#2. Figure \ gives the 5 end sequences of 
these mRNAs in full. A control mRNA that lacks AUG#1 (p-Con-0 in lane 1 of A) produced only p28. For the control 
mRNA p-Con-1 (lane 9 in A; lane 5 in B), the context around AUG#1 was improved by changing position -3 from U to A, 
thus enhancing synthesis of p8 and greatly reducing synthesis of p28. In (B), the slight residual translation of p28 evident 
in lane 5 was abolished in lane 6 by introducing downstream the structure-prone sequence 8336, which is thought to slow 
scanning and thus augment recognition of AUG#1 (Kozak, 1990a®). This is shown only as an illustration, inasmuch as all 
the other mRNAs used in this figure contained the unstructured sequence 8335 at the BamHl site. The conditions used for 
translation (protein accumulation assay) and subsequent fractionation by polyacrylamide gel electrophoresis are described 
in Materials and methods. 

[View Larger Version of this Image (39K GIFjjle)] 

The present study asks whether mutations on the 3' side of AUG#1 can also modulate the selection of translational start 
sites. As shown in Figure 2 A (lanes 2-8), the yield of p8 initiated from AUG#1 indeed varied at least 5-fold when point 
mutations were introduced in positions +4 or +5. However, the scanning mechanism predicts that if, for example, p- 
augAAA really supports initiation better than p-augAUA, as suggested by the 5-fold higher yield of p8 in lane 6 versus 
lane 8, then the yield of p28 should be proportionately lower in lane 6. That prediction is not met. Instead, the only mRNA 
in the test series that shows both elevated p8 synthesis and reduced p28 synthesis is p-augGCA (lane 2), the construct that 
has G instead of U, C or A in position +4. 

With the other mRNAs tested in Figure 2 A, a possible explanation for the variable yield of p8 without concomitant 
reduction of p28 is that mutations in positions +4 and +5, which change the subterminal amino acid, thereby alter the 
turnover of polypeptide p8. In this case, the amount of radiolabeled p8 that accumulates during the hour-long incubation 
would not reflect the efficiency of initiation at AUG#1 accurately. To circumvent this potential problem, the mRNAs used 
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in 



Figiire 2A were retested using a direct initiation assay. 



To examine the effects of mutations in position -t-6, 1 chose a codon that specifies the same amino acid regardless of which 
base occurs in position +6. Thus the N-terminal sequence of the nascent polypeptide is Met-Pro when translation initiates 
at augCCG, augCCU, augCCC or augCCA. Among these four rnRNAs there was no significant difference in the yield of 
p8 in a standard translation assay (Figure 2B, lanes 1-4). These rnRNAs were also retested using the initiation assay 
described next. 

Direct analysis of AUG codon recognition using rnRNAs with mutations in positions +4, +5 and +6 

By using a reticulocyte lysate supplemented with sparsomycin and cyclohexirnide to inhibit elongation (see Materials and 

methods), initiation complexes accumulate in which the ribosome is held at the AUG codon. The particular AUG start site 

32 

can be identified by using a primer extension inhibition assay in which a P-labeled deoxyoligonucleotide primer, 
annealed to the mRNA downstream from all potential initiator codons, is extended by reverse transcriptase up to the 3 
edge of the bound ribosome. Figure 3 outlines how the assay works in principle. 
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Fig. 3. Schematic representation of a primer extension assay for mapping the 

32 

position of ribosomes on mRNA. The unextended P-labeled primer, represented 
by the wide black line in step (b), is shown near the bottom of the polyacrylamide 
gel in step (d). Extension of the primer with reverse transcriptase in the absence of 
bound ribosomes proceeds to the 5 end of the mRNA, If ribosomes are allowed to 
bind to the mRNA before the addition of reverse transcriptase, primer extension 
halts prematurely; the exact size of the extension product(s) reveals which AUG 
codon(s) were selected, taking into account that the leading edge of an 80S 
ribosome extends -15 nucleotides 3 of the AUG codon (Kozak and Shatkin, 1977 
S). The basic design of this 'toeprinting* assay was developed by Hartz et ah (1988)51 for studies with prokaryotic 
ribosomes. The primer used in the present studies was 23 nucleotides long, the full-length extension product was 
184 nucleotides and the extension inhibition products obtained when a ribosome was bound at AUG#1 or AUG#2 were 
123 and 109 nucleotides, respectively. 

[View Larger Version of this Image (35K GIF file)] 

Two control reactions in Figure 4 illustrate how the assay works in practice. With the control mRNA p-Con-1 in which 
AUG#1 resides in a nearly optimal context (ACAaugG, see Figure I), one prominent primer extension product is evident 
in Figure 4 A (lanes 1 1 and 12) and the size of this product indicates that it derives from ribosomes bound at AUG#1, the 
start codon for p8. This primer extension product was absent when p-Con-0 mRNA was used for ribosome binding 
(Figure 4A, lanes 1 and 2), consistent with the fact that p-Con-0 lacks the upstream AUG codon (see Figure I). With p- 
Con-0 the toeprinting assay maps ribosomes instead at the p28 start codon. (The p28 start site is labeled AUG#2 in Figure 

4 because it is the second start codon i n all rnRNAs except p-Con-0.) ~ 

Fig. 4. Primer extension analysis of ribosome-mRNA complexes. Initiation at the 
first and second AUG codons was monitored as a function of introducing mutations 
around AUG#1 . The assay is explained in Figure 3. The primer (PR) and primer 
extension products are labeled along the left margin. (A and B) The rnRNAs used 
in lanes 3-10 varied only in position +4 or position +5, as indicated at the top of 
each panel. The sequences of these rnRNAs as well as the two control transcripts 
(lanes 1, 2, 1 1 and 12) are depicted in full in Figure L Adjacent bracketed lanes 
show that, with a given mRNA, the ratio of initiation at AUG#1 versus AUG#2 
was the same when low (lanes 1, 3, 5, 7, 9 and 1 1) and 3-fold higher (lanes 2, 4, 6, 8, 10 and 12) levels of initiation 
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complexes were analyzed. Because of small variations in the amount of radioactivity applied to each lane, the important 
comparison is not the intensity of the AUG#1 band from lane to lane, but the ratio of AUG#1 to AUG#2 in each lane. 
These ratios are given in Table I. (C) Toeprint analyses with mRNAs that differed in position +6, as indicated above lanes 
5-8. Lanes 1-4 display the complementary strand sequence of p-augCCA mRNA. A series of black dots within these 
sequencing lanes highlight the C-A-T bands that correspond to the first, second and (silent) third AUG codons. When the , 
electrophoresis was run for longer, the foreshortened primer extension products caused by bound ribosomes could be 
mapped, by reference to the sequencing lanes, 15-16 nucleotides downstream from the first and second AUG codons. In 
the absence of ribosomes, the primer was extended all the way to the 5 end of the mRNA, as shown in a control 
reaction (D). 

[View Larger Version of this Image ( 78K GIF file)! . 



The rest of Figure 4A tests the effects of introducing mutations in position +4 flanking AUG#1. Because all four test 
transcripts (the first four mRNAs in Figure 1) have U instead of the optimal A in position -3, some ribosomes are able to 
reach AUG#2 by leaky scanning. The question is whether the ratio of initiation at AUG#1 versus AUG#2 differs among 
these four mRNAs which are identical except for position +4. Quantitation of the data from Figure 4A (Table I, 
measurement 1, entries 1-4) indeed shows a 2.6-fold shift in favor of AUG#1 when that codon is followed by G in 
position +4 (henceforth written G +4 ). There was no real hierarchy among the other three nucleotides in position +4. 



Table I. Effects of mutations in positions +4, +5 and +6 as monitored by primer extension analysis of ribosome-mRNA 
initiation complexes 

[View Tablel 



The toeprinting experiment was repeated in Figure 4B using four mRNAs that were identical except for position +5. 
Quantitation of these results showed no significant shift in the AUG#1/AUG#2 ratio (Table I, measurement 1, entries 5-8). 
Thus there was no evidence that the nucleotide in position +5 affects the selection of translational start sites. Nor was there 
any significant effect when mutations in position +6 were tested (Figure 4C; Table I measurement 1, entries 9-12). 

2+ 

Because leaky scanning in cell-free translation systems was shown previously to be sensitive to the concentration of Mg 
(Kozak, 1989bE, 1990b®), I repeated the test of mutations in positions +4, +5 and +6 at three different Mg 
concentrations. The results of these toeprinting assays are shown in Figure 5 and the quantitation is given in Table I 
(measurements 2, 3 and 4). As reported previously, when a given mRNA is tested at different Mg 2+ concentrations, the 
tendency to scan past AUG#1 and initiate instead at AUG#2 increases as the Mg 2+ concentration is decreased. This can 
be seen in Figure 5 A, for example, by coinparing the translation of p-augGCA in lanes 1 , 5 and 9. The point is sustained 
by comparing any other mRNA in Figure 5A at low, medium and high concentrations of Mg 2+ (e.g. p-augUCA in lanes 
2, 6 and 10; p-augCCA in lanes 3, 7 and 1 1 ; or p-augACA in lanes 4, 8 and 12). 




Fig. 5. Selection of AUG start sites as a function of magnesium concentration and 
sequence variation following AUG#1. (A) Twelve toeprinting reactions using 
mRNAs that differed from one another only in position +4 following AUG#1 . The 
identity of the base in position +4 is marked above each lane. The primer extension 
reactions were carried out after 4 min incubation in a reticulocyte translation 
system in which the magnesium concentration had been adjusted to 1 .35 (lanes 1- 
4), 2.02 (lanes 5-8) or 2.70 mM (lanes 9-12). The experiment was repeated using 
mRNAs that differed from one another only in position +5 (B) or position +6 (C). 
The mRNA sequences are given in full in Figure 1. Autoradiograms (similar to 
Figure 4) have been cropped. Quantitation of these results is given in Table I. 
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2+ 

The real purpose of this experiment was to compare, at a given concentration of Mg , the translation of four mRNAs that 
differ in a single position downstream of AUG#1 . In Figure 5 A (lanes 1-4) this four- way comparison reveals that G is the 
only nucleotide in position +4 that augments recognition of AUG#1 at low Mg 2+ , and that conclusion holds at moderate 
(lanes 5-8) and high (lanes 9-12) Mg 2+ concentrations. In Figure 5B, a similar experiment using four mRNAs that differ 
only in position +5 shows that, at any given Mg 2+ concentration, AUG#1 is recognized with equal efficiency irrespective 
of nucleotide changes in position +5. Figure 5C shows that, at any given Mg 2+ concentration, recognition of AUG#1 is 
indifferent also to the nucleotide in position +6. The conclusion from this analysis is that G +4 appears to be the only 
nucleotide on the 3 side of the AUG codonthat augments initiation. 

Distinguishing between particular codon effects and generalized context effects 

Although G +4 augments AUG codon recognition under a variety of reaction conditions, as shown above, in all those 
studies the G in position +4 was part of the codon GCA. To determine if the augmentation is attributable specifically to 
G +4 or if it is the flanking codon GCA that happens to favor initiation, I tested mRNAs that had six different GNN codons 
adjacent to AUG#1. In the toeprint analyses shown in Figure 6 A, each mRNA was compared with a matched construct 
that had C or A instead of G in position +4. Quantitation of the results (Table U) reveals that AUG#1 was indeed 
recognized -3-fold better in five out of six cases where G +4 was the flanking nucleotide. Since five different flanking 
codons starting with G (GCG, GCU, GCC, GCA and GAU) strongly augmented the recognition of AUG#1, it seems 
reasonable to attribute the enhancement to the G residue in position +4 rather than to a particular flanking codon. 




Fig. 6. Additional testing of various codons following AUG#1. Primer extension assays were 
carried out using a reticulocyte translation system with the Mg concentration adjusted to 
1.7 mM. The codon flanking AUG#1 is identified above each lane of the gel. Except for this 
first codon variation, the mRNA sequences were as given in Figure L (A) The positive effect of 
G +4 is seen with a variety of codons flanking AUG#1 . The first eight lanes, compared two at a 
time, show the shift in initiation when AUG#1 is followed by G versus C in position +4. Thus 
the flanking codons are GCG versus CCG in lanes 1 and 2, GCU versus CCU in lanes 3 and 
4, GCC versus CCC in lanes 5 and 6 and GCA versus CCA in lanes 7 and 8. The last four lanes 
show the shift when AUG#1 is followed by G versus A in position +4. The flanking codons are 
GAU, lane 9; AAU, lane 10; GUA, lane 1 1; and AUA, lane 12. Quantitation of these results is 
given in Table II. (B) The stimulatory effect of G +4 , evident when AUG#1 is followed by GAU 
(lane 1) or GGA (lane 6), fails when the flanking codon is GUN (lanes 2-5). (C) Variations in position +5 do not 
significantly affect recognition of AUG#1, as shown with CNA as the flanking codon (lanes 1-4). The mRNAs used in 
lanes 5 and 6 are controls. Warming during electrophoresis slightly retards the mobility of samples near the edges of the 
gel. 

[View Larger Version of this Image ( 54K GIF file)] 



Table II. The positive effect of G in position +4 occurs with a variety of codons flanking AUG#1 

[View Table] 



In Figure 6A, augGUA was the only mRNA in which G unexpectedly failed to enhance initiation. To determine whether 
it is specifically the flanking codon GUA that disfavors initiation or whether the 3 sequence GU somehow undermines 
recognition of the preceding AUG codon, I tested initiation at augGUG, augGUU and augGUC along with augGUA. 
Figure 6B (lanes 2-5) shows equally poor recognition of AUG#1 with all four constructs in this series. Thus the usual 
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' stimulatory effect of G +4 , seen in Figure 6B with the control transcripts augGAU and augGGA (lanes 1 and 6), fails for 



some reason when G +4 is followed by U 



,+5 



Although U in position +5 prevents the usual stimulatory effect of G +4 , U in position +5 is not generally deleterious. 
Thus, augAUA was not recognized less efficiently than augACA, aug AAA or augAfiA in Figure 5B. The point is 
confirmed in Figure 6C where augCUA (lane 4) was recognized as efficiently as augC£A, augCA_A or augCSA (lanes 1- 
3)- 

Effects of mutations flanking a CUG start codon 

In view of some earlier reports about effects of downstream mutations (Boeck and Kolakofsky, 1994B; Grunert and 
Jackson, 1 994B), it seemed useful to retest some of the foregoing conclusions with mRNAs that initiate translation at a 
non-AUG codon. In the mRNAs depicted in Figure 7A, CUG replaces AUG#1 as the start codon for p8. When these 
mRNAs were used as templates in a standard translation assay, some [ 3 H]leucine-labeled p8 was produced (Figure 7B, 
lanes 2-5), albeit less than with AUG as the start site for P 8 (Figure 7B, lanes 1 and 6). That AUG as the p8 start codon is 
much stronger than CUG is also evident from the greater inhibition of p28 synthesis in lanes 1 and 6 compared with lanes 
2-5 in Figure 7B. The complete absence of p8 when the CUG codon was mutated (Figure 7C, lane 1) confirms that CUG 
is the source of pg in lanes 2-5. _ 




Fig. 7. Effects of mutations in positions +4 and +5 on recognition of a CUG initiator codon. (A) Sequences of the mRNAs 
used for translation. The major difference from Figure I is that AUG#1 has been replaced here by a CUG start codon. To 
compensate for the weakness of the CUG codon, the preceding sequence includes the optimal A in position -3. Each 
construct was tested with an unstructured sequence (8335) and a secondary stmcture-forming sequence (8336) 
downstream. (B and C) Autoradiograms of [ 3 H]leucine-labeled proteins resulting from translation under standard 
conditions (2 mM Mg 2+ ) of the mRNA indicated above each lane. The autoradiogram in (Q was exposed twice as long as 
that in (B). (D) Cropped autoradiograms of toeprinting reactions carried out with p-cugAGU mRNA (lanes 3 and 4), p- 
cugAAU (lanes 5 and 6), p-cugGGU (lanes 7 and 8) and p-cugGAU (lanes 9 and 10). Lanes 1 and 2 show a control 
reaction with mRNA #9 which lacks the upstream CUG start site. For each mRNA, adjacent bracketed lanes show 
toeprinting reactions carried out using the first two 32 P-containing fractions eluted from the Sepharose CL-4B column. 
Because of difficulty in synchronizing the collection when several columns are run at the same time, the first fraction 
(lanes 1, 3, 5, 7 and 9) contains less radioactivity in some cases. Consequently, the mRNAs are most easily compared by 
focusing on lanes 2, 4, 6, 8 and 10, where equal radioactivity was applied. 

[ViewLarper Versions of these Images (22 + 36K GIF file)! _ , 



Because earlier studies showed that appropriately positioned downstream secondary structure aids the recognition of weak 
initiator codons (Kozak, 1990aH), I tested the CUG mRNAs with both an unstructured sequence (oligonucleotide 8335) 
and a stmcture-forming sequence (oligonucleotide 8336) downstream. The inclusion of oligonucleotide 8336 significantly 
elevated initiation from the CUG codon in Figure 7B (compare the yield of p8 in lane 4 versus lane 2, or in lane 5 versus 
lane 3); therefore, this downstream sequence was retained in the mRNAs used in Figure 7C and D. 

In Figure 7C, I examined the effects of mutations in position +5 flanking the CUG start codon for p8. 1 specifically tested 
A versus G in this position because Grunert and Jackson (1994)H reported the biggest effect when A +5 , which they 
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considered optimal, was changed to G +5 . Although I too observed a higher yield of p8 with the mRNA that initiates at 
cugGAu instead of cugG£U (lanes 4 and 5 in Figure 7B and C), closer inspection of the results argues against 
concluding that A* 5 enhances initiation at CUG. Notice, for example, that the yield of p8 was not augmented by A in 
position +5 when the flanking codons were AGU versus AAU (Figure 7C, lanes 2 and 3) instead of GGU and GAU. 

Because differences in protein stability might distort the results of the protein accumulation assays in Figure 7B and C (for 
example, N-terminal acetylation might stabilize the form of p8 initiated from cugGAU), the critical test was whether 
mutations in position +5 would affect recognition of the CUG codon when initiation was monitored directly, using the 
primer extension assay. As shown in Figure 7D, although G in position +4 augmented recognition of the upstream CUG 
codon (compare cugAGU with cug£GU, lanes 4 and 8; or compare cugAAU with cugSAU, lanes 6 and 10), there was 
no convincing difference between matched mRNAs that had G versus A in position +5 (compare cugAfiU with cugAAu 
in lanes 4 and 6; or compare cugG&J with cugGAU in lanes 8 and 10). 

Notice that the extended context in these CUG-containing mRNAs (GACAUAcugRRU) is the same as that used by 
Griinert and Jackson ( 1 994)®. 



Discussion D 

The optimal context for initiation does not extend beyond G in position +4 

These experiments refute the suggestion that the recognition of initiator codons is strongly favored by A in position +5 
and U in position +6, as proposed by Boeck and Kolakofsky (1994)® and Griinert and Jackson (1994)®. By using an 
assay that directly monitors the initiation step of translation, I found no effect on recognition of the first AUG codon when 
position +5 was varied in the series a£a, AAA, a£a, aUa (Table I, entries 5-8) or the series C£A, CAa, C&V, CUa 
(Figure 6C). The efficiency of initiation at AUG#1 was also affected very little when position +6 was varied in the series 
CC£, CCU, CC£, CCA (Table I, entries 9-12) or the series GC£L, GCU, GC£ GCA (Table II, entries 1, 3, 5 and 7). 
Because I did not test all possible combinations, these experiments do not rule out the possibility that, as part of some 
particular sequence, A +5 and U* 6 might be preferable to some alternative sequence (see below); but the experiments do 
preclude generalizing the optimal context for initiation to include A +5 and U +6 . 

The experiments herein do suggest, on the other hand, that the positive effect of G +4 is generalizable. As illustrated in 
Figure 6, recognition of AUG#1 improved in response to six different codons that introduced G in position +4: GCG, 
GCU, GCC, GCA, GAU and GGA. A strong contribution of G +4 was also seen in the experiment in which a CUG codon 
was used in place of AUG#1 (Figure 7D). Speculation that the frequent occurrence of G in position +4 might simply 
reflect selection for Ala, Gly and Val as the penultimate amino acids (Flinta et aL, 1986®), rather than a role for G in 
initiation, no longer seems valid. The experiments herein, using an assay that directly monitors nl>osome-rruTNAlnitiation 
complexes, show that recognition of AUG start codons is stimulated strongly by G +4 . 



augGU is not a favorable context for initiation 

The one interesting situation in which G in position +4 failed to augment recognition of AUG#1 involves the sequence 
augGUN. As shown in Figure 6A, for example, augGUA (lane 1 1) was recognized with only the same low efficiency as 
augAUA (lane 12). At first glance, the data in Figure 6A (e.g. lane 9 versus lane 11) seem to coiifirm an earlier report that 
(gug)GALL is a much stronger initiation site than (gug)GUA (Boeck and Kolakofsky, 1994®). That observation 
contributed to the idea that A +5 and U +6 might be part of the optimal context for initiation. However, the more extensive 
set of data in the present study shows it is not that A +5 and U +6 augment initiation, but that the usual stimulatory effect of 
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• G +4 fails in the case of augGUA. Table U shows, for example, that augAALL (entry 10) is not recognized significantly 
better than ang AUA (entry 12). Instead, Table II shows unexpectedly low recognition of AUG#1 when the flanking codon 
is GUA (entry 1 1) compared with every other mRNA that has G in position +4 (entries 1, 3, 5, 7 and 9). 

This unexpected deficiency is not limited to the sequence augGUA. Figure 6B shows that the usual stimulatory effect of 
G +4 fails with every codon that begins with GU. The simplest interpretation is that the sequence GU in positions +4/+5 
somehow distorts the mRNA and thus impairs AUG codon recognition by the scanning 40S ribosomal subunit. 

Because the deleterious effect seems to be attributable to a particular flanking sequence (augGU) rather than to a 
particular flanking codon, it is not likely that the defect occurs after assembly of the 80S ribosome when a tRNA tries to 
enter the A site. It seems unlikely, for example, that augGUA is a poor initiation site because the complementary Val- 
tRNA is scarce (Zhang et a!., 1991®) or because Val-tRNA is structurally incompatible with Met-tRNA™ 1 when both 

tRNAs line up on the 80S ribosome (Irwin et al, 1995®). Those explanations might be tenable if the defect were limited 
to augGUA; but augGUG, augGUU and augGUC were equally poor start sites. Experiments described herein specifically 
contradict the idea that scarcity of the elongator tRNA required to form the first peptide bond at AUG#1 might shift 
initiation to a downstream site. That hypothesis appears to be ruled out by the results shown in Figure 6, where augGCG 
(Figure 6 A, lane 1) and augGGA (Figure 6B, lane 6) were recognized efficiently despite the low abundance in 
reticulocytes of tRNAs corresponding to GCG and GGA (Hatfield et al, 1 979®, 1 982®). 

Initiation is best studied with an initiation-limited assay 

The initiation-limited assay used here obviates two problems that can confuse assessment of the degree to which leaky 
scanning, caused by mutations around AUG#1, allows access to AUG#2. One problem with standard protein synthesis 
assays is that elongating ribosornes advancing from the upstream start site can block access to a second start site 
downstream, thus making AUG#1 appear stronger (less leaky) than it really is (Fajardo and Shatkin, 1990S; Kozak, 1995 
®). This sort of distortion, called elongational occlusion, was avoided in the present study by using inhibitors that prevent 
ribosornes from advancing beyond the initiation step. 

A second potential problem is that varying bases +4, +5 and +6, and hence varying the penultimate amino acid, might 
change the stability of the test protein. This was argued not to be relevant in other studies (Boeck and Kolakofsky, 1994®; 
Griinert and Jackson, 1994®) because the amino acid changes that would result from the mutations in positions +5 and +6 
should not have rendered the polypeptide unstable according to the N-end rule. However, the beautifully elucidated N-end 
rule pathway for protein turnover applies to proteins derived by proteolytic processing (Bachmair et al, 1986®; de Groot 
et al, 1991; Gonda et al, 1989®; Varshavsky, 1995®). One should not necessarily expect the predictions of the N-end 
rule to apply to nascent polypeptides in which the subterminal amino acid is varied. Unlike proteolytically derived 
polypeptides, the N-terminus of nascent polypeptides is subject to modification by methionine aminopeptidases, 
acetyltransferases, N-tenninal amidases and other enzymes that may affect protein stability (Moerschell et al, 1990®; 
Kendall and Bradshaw, 1992®; Stewart et al, 1994®; Baker and Varshavsky, 1995®). It is not known whether the extent 
of these N-terminal modifications varies among batches of reticulocyte lysate, or whether the high level synthesis of some 
proteins in vitro might exceed the capacity of endogenous modifying enzymes. Because of these uncertainties, it seems 
dangerous to assume that differences in protein accumulation in response to mutations in positions +4, +5 and +6 reflect 
an effect of these nucleotides on the initiation of translation. 

The present study gets around this concern by replacing the customary protein accumulation assay with one that directly 
monitors the initiation step of translation. Indeed, had I relied simply on measurement of protein yields, I might have 
concluded that initiation at augAAA was 5-fold more efficient than at augAUA (Figure 2 A, lanes 6 and 8). However, 
those two start sites functioned with identical efficiencies when initiation was assayed directly (Figure 5B; Table i, entries 
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* 6 and 8). 

Non-AUG start sites have the same flanking sequence requirements as AUG start sites 

There is no compelling evidence for m-acting elements in mRNAs that act uniquely at CUG, ACG and GUG start sites. 
Instead, non-AUG start codons seem to show a stronger dependence on the same ancillary features that augment AUG 
codon recognition. In some studies, for example, mutation of A" nearly abolished initiation from a CUG or ACG codon 
(Peabody, 1987®; Portis et aL, 1994®). In the experiments described herein, initiation at a CUG codon was barely 
detectable in the absence of G* 4 (Figure 7C and D). The enhancing effect of downstream secondary structure, previously 
demonstrated for AUG start sites (Kozak, 1990a®), was also evident with CUG start sites in Figure 7B. The strong 
dependence on these ancillary features probably follows from the fact that alternative start codons can form only two, 
instead of the usual three, base pairs with the anticodon in eukaryptic Met-tRNA™ 61 . Prokaryotes are similar in the sense 
that initiation at a weak UUG start site requires an unusually strong Shine-Dalgarno interaction (Weyens et aL, 1988®). 

In an earlier study, Boeck and Kolakofsky (1994)® postulated that A 4 " 5 and U* 6 specifically augment initiation at non- 
AUG start sites, but they did not include tests with AUG in place of GUG. A companion paper (Griinert and Jackson, 
1994®) reported, on the other hand, that the effects of mutating positions +5 and +6 around a CUG start codon were 
qualitatively similar to the effects at an AUG codon. 

In the present study, most of the mutations were introduced around AUG codons because the poor initiation at non-AUG 
codons, even in the best of circumstances, makes quantitation difficult. However, the experiment shown in Figure 7D 
using a CUG start site confirms the conclusion reached for AUG start sites: that recognition of the initiator codon 
improves when G is substituted for A in position +4, while substitutions in position +5- have no discernible effect. 

There is also no compelling evidence for trans-acting factors in eukaryotes that specifically recognize non-AUG start 
codons. Some interesting experiments in yeast in which certain mutations in eIF-2 were shown to'activate a silent UUG 
codon (Donahue et aL, 1988®; Dorris et aL, 1995®) are sometimes cited as evidence for a UUG-specific initiation factor. 
However, augmented initiation at UUG could be explained if the mutations in eIF-2 simply enhance its non-specific 
binding to mRNA. This could allow erroneous initiation events (i.e. use of a codon that only partially matches the 
anticodon in Met-tRNAj 1 ™*) in the same way that streptomycin induces errors during polypeptide elongation by 
strengthening non-specific contacts between ribosomes and tRNA, and thus decreasing dependence on specific codon- 
anticodon contacts. 



Materials and methods D 

Construction of plasmids 

Plasmids used herein were derived from Riboprobe vector pSP64 (Promega Corp.) into which a CAT cartridge 
(Pharmacia Biotech) was previously inserted at the BamHl site (Kozak, 1989b®). The vector had been modified 
previously by introducing a T7 phage promoter (Kozak, 1994®) followed by the sequence 

GAAGCTAAAACAAATCAATCAATCAAAACACAAQ CTT. This synthetic 5 non-coding sequence, which is devoid 
of secondary structure, was chosen because it supports efficient translation when an appropriate initiator codon is 
introduced downstream. Between the Hindlll site (AAGCTT underlined above) and a nearby BamHI site marked in 
Figure I, I inserted synthetic deoxyoligonucleotides that contain two ATG (AUG) codons, as illustrated in Figure L Using 
the cassette mutagenesis technique depicted in Figure I, I systematically varied the codon on the 3 side of AUG#1. The 
plasmids and resulting mRNAs were named according to the sequence following the first start codon for translation: p- 
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■. augGCA, etc., in Figure 1; p-cugAGU, etc., in Figure 7. 

Because the presence of secondary structure appropriately positioned downstream from an AUG or CUG codon can 
augment initiation (Kozak, 1990a®), two different sequences were used downstream. Beginning at the BamHl site marked 
in Figure 1, the sequence was either GAUCCAAAGUCAGCCAAAUCAA (oligonucleotide 8335) or 
GAUCCGGGUUCUCCCGGAUCAA (oligonucleotide 8336). The latter sequence can form a stem-loop structure with a 
stability of -19 kcal/mol (Kozak, 1990a®). Constructs that contain oligonucleotide 8336 are identified explicitly in the 
text and figures. All mRNAs discussed without mentioning the downstream sequence contained the structure-free 
oligonucleotide 8335, as in the mRNAs depicted in Figure L 

Standard recombinant techniques used for these constructions were described previously (Kozak, 1989b®). Plasmids were 
propagated using Escherichia coli RR1 (Gibco/BRL). The structures of all plasmids were confirmed by appropriate 
dideoxy chain-termination sequencing reactions using Sequenase-2 (U.S. Biochemical Corp.). 

Synthesis of capped mRNAs 

CsCl-purified plasmid DNA, linearized by digestion with Aval, was used as the template for transcription by T7 RNA 
polymerase. Transcription reactions were carried out at 37°C as described previously (Kozak, 1989b®) except that, after a 
12 min incubation with m7GpppG caps (10 U/ml, Pharmacia Biotech), the GTP concentration was increased to 500 uM 
and incubation was continued for another 60 min. The reactions contained RNase inhibitor (150 U/ml, Gibco/BRL). 

To ensure uniformity, all transcripts intended for a given translation experiment were synthesized using aliquots from a 
common reaction mixture, which included a trace of [ 3 H]UTP to facilitate quantification. mRNAs were extracted with 
phenol and purified by application to pre-spun Sephadex G50 columns (Boehringer Mannheim). 

Complete translation assay 

For the standard protein accumulation assay, a rabbit reticulocyte translation system supplemented with [ 3 H]leucine 
(140 uCi/ml, sp. act. 180 Ci/mmol) was programed with mRNA (12 ug/ml) and incubated for 1 h at 30°C. The Flexi 
reticulocyte lysate (Promega Corp.), which constituted 50% of the final reaction volume, was supplemented with 60 mM 
KC1 and 19 non-radioactive amino acids at 20 uM each. In addition to the endogenous Mg (stated by the supplier for 
each batch of lysate), reactions were supplemented with Mg(CH 3 COO) 2 to give a final Mg concentration of 2 mM, 

unless otherwise stated in the text or figure legends. A standard Mg concentration of 2 mM was chosen because it was 
shown previously to support a pattern of context-dependent initiation in vitro similar to what is seen in vivo (Kozak, 
1989b®). To minimize variation, aliquots from a common reaction mixture were used for translation of all mRNAs in a 
given experiment. 

Translation products were analyzed by polyacrylamide gel electrophoresis as described previously (Kozak, 1989bH). The 
gels were impregnated with Entensify (DuPont NEN) before autoradiography with Kodak X-omat AR film at -70°C. 

Primer extension assay of initiation complexes 

Prior to ribosome binding, each mRNA was annealed with a 32 P-labeled deoxyoligonucleotide that would serve to prime 
the final reverse transcriptase step. The 23 nucleotide primer CTCAAAATGTTCTTTACGATGCC is complementary to 
codons 16-23 in the CAT coding domain. The primer was first labeled at the 5 end by incubation with T4 polynucleotide 
kinase and [Y- 32 P]ATP (3000 Ci/mmol). An aliquot of the 32 P-labeled primer was then incubated with mRNA (-5 pmol 
of each) in 1 1 ul of 50 mM Tris-HCl (pH 7.5) for 2 min at 65°C followed by 10 min at 37°C. The primer-mRNA 
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complexes were transferred to wet ice and held briefly while the reticulocyte reaction mixtures were assembled. 

A rabbit reticulocyte lysate was used under the conditions described above except that [ 3 H]leucine was omitted and 
inhibitors of elongation (sparsomycin at 200 uM and cycloheximide at 90 ug/ml) were added. These inhibitors cause 
accumulation of initiation complexes in which the 80S ribosome is held at the AUG codon. Aliquots of a common 
reaction mixture were dispensed to glass tubes which were pre-incubated for 2 min at 30°C before adding the mRNA- 
primer complexes. After 4 min incubation at 30°C to allow ribosomes to engage the mRNA, the samples were applied to 
Sepharose CL-4B columns (15x0.7 cm) at 4°C. The column elution buffer contained 50 mM Tris-HCl (pH 8.3), 40 mM 
KCl, 6 mM MgCl 2 , 5 mM dithiothreitol (DTT) and cycloheximide at 90 ug/ml. Column purification was omitted when 

more than six mRNAs were tested at one time. 

Sepharose column fractions that contained 32 P-labeled ribosome-mRNA complexes were supplemented with 600 uM 
dATP, dGTP, dCTP and dTTP and with murine leukemia virus reverse transcriptase (Gibco/BRL Superscript II, used at 
2 U/ul). Incubation at 37°C for 15 min allowed the primer to be extended up to the position of the bound ribosome. The 
positions of ribosomes on each mRNA were deduced from the lengths of the primer extension products, as determined by 
co-electrophoresis with an RNA sequence ladder. Appropriate ladders were generated fromdideoxy sequencing reactions 
carried out at 42?C with avian myeloblastosis virus reverse transcriptase. Denaturing 8% polyacrylamide gels were used 
for electrophoresis. Autoradiograms of the dried gels, obtained in most cases with Kodak LS film, were quantified by 
densitometry. When weak start codons were tested (e.g. CUG in Figure 7D), AR film was used with an intensifying screen 
at -70°C. 

A previous study that used this primer extension (toeprinting) assay describes some additional details and controls 
(Kozak, 1995®). 
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1 Introduction 
•me control or gene expression is achieved by a series 

nueJcar membrane to yield a mature mRNA h « been 
relatively welt characterized for many genes inrough 
nu e c acid sequencing approaches. The : second phase. 
£t nation into protein (^"^"SS 

SJVodSnUnd final destrucuon has been ess 
comprehensively characterized- Both phase ^ ^ 
contain important control * £ 

S rf-Ho' k ^ h^renuo which mRNA abundance, 
eries^^ 

slcnal^D )%uSSfoSn£W been employed to 
buJd , an itative databases describing gene : exp ess.on 
at the protein level [8-U]. By combining these ap 
laches ?t is possible for the first time to examine both 
levels ll which gene expression is controlled, and 



A comparison of selected mRNA and protein 
abundances in human liver 

• . „r ih« Auerall level of cor/elation between 
l„ order to obtain « <*>™* £ * h * J" wclcriid pharmaceutical^ rele- 
mRNA and protein ^^J a J^^^ W Jty quantitative two- 
vanl biological system, we have analyzed humani. f b y / Transcript jmage 
dimensional electrophoresis (for protein ab ■} b|Se was sear- 
methodology (for mRNA ^^^^^)'^^^^ c ^Q^l^oad'ms to a series of 
ched for expressed sequence : ug (EST) «^ ue " c " |" £. , (LSB) M olcc- 
23 proteins identified on 2-D maps m the ; Large ^°'°X V 9 me3MgcJ 
ular Anatomy- database. ™^«\™^^£™^?n«*i\on eoefT.- 
(4 were undetected) among 7926 liver clones '^"f""^ abundances deter- 
LtofO.^ obtame b^een mR £A J^PJJ^arf rC£ulation 
mined by the two approaches. s "M«ung ma p 0I8anistns . A corn- 

of gene expression is a frequent phenomenon in n« »' i Sl _IS8)on 
parfson with published data (Kawamoto S liver) 
ihe abundances of liver mRNAs for Pi"™ f ?^;!.} "nriched in secreted 
susgests that higher ^^^^^J^SS^^^^ 
sequences. Our data confirms this - or i he 5J «« l J° abundanl prote ins 
coded for secreted protc.ns. while none or the su mos 

l ° ^^A^^^nlT 1 ^ *** Wood)." 
proteins were present in mis %tv*v ■ 

thereby to develop a global understanding of eene 

expression control. 



C.r.«po.,„« : Dr. ^ "^^rffi,&XK 
•Ion 4610 Medical Center Dnvt. RoclniUc. MB OTHIWJJ* 
♦EmIM 989. ♦MI-li*W2: ero.ll: WaMlrtceorrt 

'nwmM S bhr«.;.i;«: CBB, Coomwle Brilliant Blue 

tlAirapboK.il / Twucripi imaie / LW«t / »«vl»i»«" 
C veil Vnlot((lcicll«n.H «m. *«••>«- 



Tn we are aware of surprisingly little published 
* *on £ overall reUiottbip of messa c an protein 
abundance, with the exception of * recen jy 
Kawamoto el a/. 1121. comparing mRNA levels ooamw 
S pusma protein gen« by transcript ,»a» ejegodo . 
-Uy with the abundances of Ihe corresponding plasma 
nroteins in circulation. This report appeared to show a 
E oiwUiion between mRNA and prole* abun- 
S based on data for nine human gene products It 

liely, however, that such 
sliiute a special case, since they arc raptdly deUvoeo 
om the cell of synthesis to the plasma eojjerwent. 
where many of the mechanisms d» ^^X? 
nrntcin abundance arc presumably absent. We irier 
E Z « ? compare mRNA and protein levds for 

Iargu ^ of ^^^-^^ 
whether a simple rcbuonshi, CKt*„ q whether 

S rtuniS ft» O.O.. for -go, pro*** 

Z Materials and methods 

T ^:r^ ? ^ temperature in the 
National BiomonUoring /P«««^ ^ *? ^ 
National Instate of Standard ""J^Techn 
8-fold excess of 9 M urea 2% NP-^ J* ^ , 
Mhanol and 2% carrier ampholytes (LKB ? i W- J ^= 
of ne rTsultiag sample was « 
DALT 2-D electrophoresis system, and be e« l * 

colloidal CooLssle Brilliant Blue (CBB) G-2S I as 
^iouslydescribed JlWg la^gcl was 

U12 r s d car and Z Seized ge, images pro- 
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T«hlc 1. I'roieifl und mRNA ubundanccs in human iWcr reported foe 23 selected molecule* 



Prolcin name 



CDfbomyl phojphalc 

synlhusc 
Actin bcln 

Heat thock protein 60 
Proioin disulfide isomcrase 
79 KD kUko9< regulated 

prolcin / BIP 
Calrclieulin 
Fl ATratc beta 
Aciin Ram ma 
Hon shock cogn«i« 70 
Cytochrome B5 
Hndoplasmin 
73 KD fllucosc regulated 

protntn 
Pyruvate carboxylase 
Heat shock pralun 70 
Tubulin beta 1 
Vi»n«nllrt 

Tropomyosin 
MADPH cytochrome 

reductase 
Tubulin clphV 1 
Kcal ahock protein 9Q 
Cytochrome oxidaje II 

{mil encoded) 
Lnmlnln rc cop lor 
Umifl B 



Prolcin 


Average prttein 
abundance 


Protein 
devlotio 


CPS 


\U l* /j 


12373 


ACTO 




17793 


HSP60 


37456 


1939 


POI 


31260 


1942 


HIP 




1993 


CRTC 




2076 


FlATPB 


19539 


1375 


ACTG 




9012 


KSC70 


21647 


90 It 


CYB5 


18776 


1656 . 


ENPL 


17817 


5829 


CR75 




1*21 


PYVC 


14655 


1930 


HST70 


8629 


1565 


TBB! 


7125 


|4/2 


V1ME 


£169 


952 


TPM 


4090 


600 


NP'SOR 


3303 


1319 


TBAt 


3097 


1409 


HSP90 


2740 


597 


COX-H 


J3W 


651 


LAMR 


1531 


602 


LAMB 


14S4 


371 



abundance (%| 



2i3 

1.41 
1.05 
0.37 
C.87 

0.6S 
0.&2 
0.65 
0.60 
0J2 
0.50 
0.46 

0.41 
0.34 
0.20 
0.U 
0.11 
0.09 

C.09 
0.C* 
0.07 

0.04 
0.04 



Numbc of clone* 


Avenge mciMjc 


(BLAST) 


nb und*nc: 


1 1 
1 1 


O.IJV* 


i j 


U.1R9% 


3 


0.03 9 % 


2 


0.025% 


1 


0.013% 


3 


0.038 * 




0.0J8% 


i ' 
1 


0.215% 


0.0 U% 


7 


0.084% 


5 


0.063% 


i 

i 


0.013% 


0 


Not deleted 


1 


0.013% 


3 


0.33** 


C 


not detected 


1 


0.013% 


0 


Not deieeied 


5 


-0.063% 


2 


0.025% 


0 


Not meojupH 


4 


0.050% 


2 


0.025% 



.1 Protein abundant l S B i«n in piiel-gm level, (the in^ted CBB optical u,f itac .pproprUi. »o< or^s on * 2 ; D ^) «tee mul- 

ti'ptc WOtS comprising a finale gene product h»t be« wrnmcd. M.ffcngcr RKa m«mrcm*nb u. 8 -v«n at a ,. K fni«. 
number of clones aequenced in the relevant tnn.<cr)pl Iranftcs. 



cessed using the Kepler* software system (Large Scale 
Biology) to give protein abundances in terms of pixel X 
gray-level values, as well as group average abundances 
and standard deviations over a set of seven male human 
liven. Relative abundances were computed by dividing 
individual average abundances by the average total abun- 
dance of the proteins resolved on the gels. A series of 
proteins was identified on these gels based on close 
homology with identified rodent liver spots and on iden- 
tifications published by Hughes tt at. [171. Total cellular 
RNA was extracted from samples of human liver tissue 
by the method of CbirgvWn tt cl (18). and poly-A+ RNA 
was prepared by hybridization to oligo-dT cellulose. Five 
Ug of poly-A+ RNA was used to construct a cDNA 
library using the Gubler and Hoffman method [19] in 
bacteriophage-larnbda UNIZap*' (Stratagcnc Inc., La 
Jolla. CA). The library was converted to plasmid DNA by 
bulk excision, and individual colonies were selected for 
DNA template preps. The templates were sequenced 
enzymatically (Sanger et cL (201) on an ABI 37* auto- 
mated DNA sequencer. Templates considered sequenced 
sucessfully contained > 230 bases of cDNA insert 
sequence after removal of repetitive and low information 
sequences, > 90% base call accuracy, and were not of 
mitochondrial, vector or host origin. Resulting DNA 
sequences were analyzed using the BLAST program for 
similarity with other known primate, mammalian, and 
subsequently all divisions of GenBank. Similarity data 
was stored and tabulated in the LifeSeq* software 
(Incyte, Palo Alto, CA). from which relative fractions of 
specific gene products present within the starting RNA 



prep were calculated as follows: % abundance-* clones 
representing each gene / total # of- genes 'sampled MOO. 
A total of 7925 clones were sequenced from liver ob- 
tained from two individuals: one mate (5054 clones) and 
one female (2S71 clones). Data from Table 1 of Kawa- 
moto a ct. [12), was replottcd using protein abundances 
for human plasma proteins taken as mean values of the 
range presented in reference (21]. An error in the abun- 
dance of the haptoglobin els polypeptide (which was 
assumed in (12) to account for the entire abundance of 
the haptoglobin a,p, tetramcr) was corrected. 



3 Results 

Protein and mRNA abundance data were collected for a 
set of gene products identified on 2-D gels (Table I). 
Standard deviations of the protein measurements across 
six individual liven were relatively low, averaging 19% of 
the mean abundance. Of the 23 selected proteins. 
mRNAs for 19 were delected in human liver transcript 
images. Of these 19. five were represented by J clone, 
three by 2 clones, four by 3 clones, and the rest by 
between 4 and 17 clones. Of the four gene products 
undetected at the mRNA level, one (cytochrome oxidase 
subunit II: COX-Il) was deleted from the TVansript 
Image dataset during standard initial sequence data 
workup, which removes all mitochondrial sequences. A 
plot of protein abundance (expressed as integrated Coo- 
massie Blue absorbance averaged over seven individual 
livers) versus mRNA abundance (expressed as per- 
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0 0001 
KM 



/7g ( f/^ /, A lottos ploi oflhe abundance* of each cm «cnt producu 
ni Ihc prolein level tf-aiis) and mRNA Mel (*«is). Four proteins 
Tor which mRNA meASurcmcoU were nut ivtiUblc - Ihrcc far which 
na ctonet were delected. and one IntenioruUy deleted from the RTI 
dtiwcl (C0X-1I) - «re shown boxed at the lower left with correct 
relative protein Abundance*. The Pearson prcducl moment correlation 
CTjtfT.c;«nt bei^en xhc two- sets of 19 valid roeaiuremcnis Is 0.-48. 
Each measurement is labeled with a code whose identity is ihow n to 
Table I. 



NUtWQ* 

A£ undine* 

r> of 



• ATM 



m vlti 



• ilACT 



A^urvcUnot In Plawra (rngMOCfflO 



r>5«rf 2. a lOK-lojc Plot of cUu on mRNA abundance lakeo from 
Kawamoto et at. |I2] vrryi/j avenee protein abundance* In plasma 
uVen from [2\\. The protaia abundanca *alu« for the haptoglobin als 
polypeptide hw been comcted to feUeet the foci Out Ihi* lubunil 
account! for only 21% oflhe mus of the haptoglobin o,P| tetraracr. 

cemage or total cDNA clones in the transcript images of 
two livers) indicates a modest correlation between the 
two (Fig. I). The Pearson product-moment correlation 
coefficient obtained from the 19 pairs of measurements 
is 0.48. The abundsnce values obtained at the protein 
level spanned a 70-fold range, while the delectable 
mRNA abundances spanned a 16- r old range for these 
genes (although the latter value may reflect the limited 
number of clones sequenced). One particularly inter- 
esting subset of measurements concerns the P and y 
aciins. Here the mRNA abundances . are, respectively. 
0.139% and 0.215%, whereas the prolein abundances are, 
respectively, 1.41% and 0.65% of the total. In this compa- 
rison, both sets of measurements are likely to be quite 
accurate, since numerous clones were detected for each 
of the two messages, and since the two proteins arc so 
homologous, and have such close p/s, that they should 
bind CBB similarly. Nevertheless, the relative aburt- 




flgure S. Relative abundance distribution! of the lop-ranked 100 
m^NAs and protelni detected in human, liver. The fini (leftmost) 
molecule is the most abundant. Tallowed by mulecules or decreasing 
abundance through the 100th rank (at the righU Abur.dmccs or both 
bjRNAj and proteins ire plotted at a porceptige oftoul detected mol- 
ecules on a loa scale. Message and protein paints at the wmc rani ue 
not, in general, products of the wmc gene, 

dances at the RNA and protein levels are inverted (£ 
actin is the more abundant protein, while y actin has the 
more abundant message), and ihc roRNA:protcln ratios 
for the two genes differ by more than a factor of two. 
Carbamyl phosphate synthase (CPS), the most abundant 
protein detected in liver over the p/ range of conven- 
tional 2-D gels (pH 4-7), had a relative abundance of 
2.83% (protein) and yei comprised only 0.139% of the 
total message (less than either actin). In this case, the 
mature protein is sequestered inside the mitochondrion, 
and therefore might be expected to show slow turnover 
and a consequent large disparity between mRNA and 
protein abundance. 

A reexamination (Fig. 2) of the data of [12J on genes for 
plasma proteins, using estimates for corresponding pro- 
tein abundances revised to account for the a,?, structure 
of haptoglobin, showed a higher correlation coefficient 
between mRNA and protein abundance (0.96). This 
value is probably exaggerated due to the large separation 
of the albumin values from the rest of the data: if 
albumin is omitted from the calculation, the correlation 
coefficient drops to -0.19. However, it is clear that the 
plasma proteins arc represented by many more mRNA 
copies than major cellular proteins: albumin, for 
example, accounts for about 14% of the total number of 
clones examined [12], with a number of other plasma 
proteins accounting for more than 1% oflhe toul each. 
By contrast, none of the cellular proteins chosen from 
the 2-D gel data accounted for much mora than 0.1% of 
the mRNAs sequenced. To further pursue this observa- 
tion, we compared the relative abundance distributions 
or the 300 top-ranked (most abundant) mRNAs and pro- 
leins in our data sets (Fig. 3). Forty-one of the top 100 
mRNAs, and 29 of the top 50, coded Tor proteins known, 
or expected, from sequence to be secreted from the liver, 
while none of the top 100 proteins appeared to be secre- 
tory forms of the human plasma proteins. The two most 
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abundant proteins in these samples (hemoglobin 0 and 
albumin) as well as two of lower abundance (a, antipro- 
teasc and transferrin) were blood proteins that consti- 
tute contaminants of the liver in this context-proteins 
which would have been removed by perfusion. 



4 Discussion 

Despite extensive work on the regulation of many indi- 
vidual genes, little attention appears to have been paid 
to the global question of the relation between mRNA 
and corresponding protein abundance in eukaryotes. We 
have attempted to provide an initial estimate of the rela- 
tionship of mRNA and corresponding cellular protein 
abundances through use of correspondences between 
two databases: the Molecular Anatomy" (2-D get) and 
LifeSeq* (Transcript Image) databases of human liver. 
Using a panel of 23 proteins Identified on 2-D gels of 
human liver, we searched LifeScq" to determine the 
number of clones matching the corresponding gene 
sequence by BLAST. Matches were found Tor 19 pro- 
teins, and the correlation coefficient obtained over this 
set of data was 0.46. This number is intriguingly close CO 
the' middle position between a perfect correlation (1.0) 
and no correlation whatever (0.0). One simple interpreta- 
tion of such a value is that the two major phases of aene 
expression regulation (transcription through message 
degradation on the one hand, and translation through 
protein degradation on the other) are of approximately 
equal importance in determining rne net output of func- 
tional gene product (protein). Several issues, may limit 
the quantitative accuracy of this result. First, the protein 
measurements rely on CBB binding to a series of dif- 
ferent proteins. Although the measurements obtained 
show good (low) standard deviations across a set of six 
individual livers, it is well known that different proteins 
can bind C3B wiin different affinities. Thus the measure- 
ment scale for one protein may differ from another by 
up to approximately twofold. Since, however, these rela- 
tive scale errors should bo normally distributed, wc 
expect them to have little effect on the overall correla- 
tion. Precision of the mRNA measurements is also 
limited, in this case because a limited number of clones 
was detected for the selected proteins. Five genes, for 
example, were represented by only one clone each 
among the 7925 clones sequenced from the respective 
cDNA tissue libraries. This low relative expression at the 
mRNA level is expected, since a majority of the high 
abundance mRNAs in liver code for plasma, proteins. 
However; such small numbers of clones lead to poten- 
tially large quantitative errors because of sampling error. 
Here again, wc believe these errors should be relatively 
random across the set of proteins chosen, and thus 
should not skew the result appreciably. A third potential 
difficulty is that the databases used for the protein and 
mRNA abundance estimates were prepared from dif- 
ferent samples. In future, it will thus be of great interest 
to repeal the experiment using the same samples to 
examine both mRNA and protein abundances. 

Despite these potential sources of error, at least one 
homologous pair of proteins (the fi and y actins) shows 
persuasive evidence of post-transcripu'onal regulation. 



with mRNA-to-protein ratios differing by more thin a 
factor of two between the two genes. This is a particu- 
larly striking case since the two proteins are essentially 
indistinguishable in function (apart from afimitiy for 
MgADP; 22), have very similar sequences, and are pro- 
duced in a constant ratio (approximately 2:1 in mates) in 
virtually ell cell types. One possible alternative explana- 
tion could be a sex difference in liver expression of y 
actin, as is seen in rodents [23] where y accin protein 
expression averages almost twice a$ high in femalee as 
males. This seems unlikely since 64% of clones in the 
RT1 data were from male liver, tind all the 2-D data was 
from male livers. 

An analocous set of data for plasma proteins secreted by 
the liver has been published by Kawamoto <i ai [12) and 
wc have reanalyzed their values to see whether a similar 
mRNA-to*protcin relationship holds. It appears, based 
on nine plasma proteins, thai a higher correlation coefli- 
cient applies: 0.96. This result is less convincing, how- 
ever, because one gene product (albumin) is welt-sep- 
arated from the cluster of the remaining eight, and thus 
exercises a disproportionate Influence on the correlation 
coefficient. In fact, ir albumin Is omitted from the calcu- 
lation, the correlation coefficient is reduced to -0.19, 
which suggests a very poor correlation. 

. What is perhaps more striking is the relatively much 
higher abundance of the plasma protein mRNAs as 
compared to major cellular proteins such as earbamyl 
phosphate synthase, the. actins, or cytochrome b5. Mid- 
abundance plasma proteins were represented by mRNAs 
having approximately 100-fold higher relative abundance 
than mid-abundance cellular proteins. This result U veri- 
fied by a direct comparison of the relative abundance 
distributions of the 100 top-ranked mRNAs and proteins 
in our data sets (which are, in general, different sets of 
genes). Twenty-nine of the top 50 messages are secreted 
products, while none of the top SO proteins appear to be 
the pro- form of a secreted molecule. Such a conclusion 
is not surprising, since the liver is responsible for gene- 
rating high protein concentrations in the relatively large 
plasma compartment of the body, but docs so by means 
of closely coupled synthesis and secretion with little 
accumulation of precursor proteins in process. This 
points to a potentially significant difference in the pic- 
tures obtained from mRNA and protein abundance data- 
bases. Major secreted proteins appear to have much 
more abundant mRNAs than many important cellular 
proteins, and hence mRNA abundance databases that 
concentrate on a small number of the highest abun- 
dance messages may be biased towards secreted proteins 
over cellular molecules. This represents an advantage of 
the mRNA approach relative to protein databases in the 
search for novel cytokines and other secreted proteins, 
but a disadvantage in the characterization of cellular 
metabolic and control processes. Additionally, it suggests 
that mRNAs Tor secreted proteins may have, on the 
whole, shorter half-lives than mRNAs for cellular 
enzymes, the latter being more frequently regulated at 

. the translational level. 

We also found important differences in the « V ««JJ 
shapes of the relative abundance distributions of the 100 
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top-ranked mRNAs and proteins. While both distribu- 
tions contain a few very high abundance molecules (in 
the 3—10% range) they appear to diverge significantly 
below the I5ih most abundant gene product, with pro- 
teins 16-100 accounting for roughly iwlce as high a rela- 
tive abundance as the 16lh— 100th mRNAs. Not all pro- 
teins are represented on the 2-D gels used here (which 
fail to resolve proteins with p/ >7J, but the estimated 
40% of proteins thus excluded would not affect the 
shape of the distribution over positions 50—100 signifi- 
cantly if they hove an abundance distribution similar to 
the p/ 4-7 proteins (based on a simulation asing the 
data shown). The mRNA abundance distribution covers 
all cloned messages (not a subset of genes), and for 
abundant mRNAs it should be complete as it stands. 
Altogether, the top 100 mRNAs comprise 51.3% of the 
total clones, while the top 100 proteins comprise 63.P16 
of the iota! protein delected. Hence it appears likely that 
the distribution of protein abundances is significantly dif- 
ferent from that of mRNAs, showing a more gradual fall- 
ofTin the region examined, and that techniques able lo 
detect down lo a specified percent abundance threshold 
would reveal more proteins at a given threshold than 
mRNAs. As the protein and nucleic acid databases 
expand, we anticipate the possibility of generating suc- 
cessively more robust estimates of the global relation- 
ship between mRNA and protein abundance, and thus a 
better understanding of multi-level gene expression con- 
trol in complex organisms such as man. 

Human liver samples analyzed by 2-D electrophoresis were 
kindly provided by the National Biomontoorlng Specimen 
Bank at the US National Institute of Standards and Tech- 
nology under (he direction of Dr. Stephen Wise. 
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A database of protein expression in lung cancer 

We have developed a comprehensive approach to identifying molecular changes in 
lung cancer that includes both genomic and proteomic analyses. The related effort 
has produced a large amount of data pertaining to gene expression at the RNA and 
protein levels. As a result, we have constructed a database that contains protein 
expression data on lung cancer as well as other relevant data including DNA micro- 
array derived data. A large number of proteins that are expressed in different types 
of lung cancer have been identified and have been correlated with the expression 
measures for their corresponding genes at the RNA level. The database is intended to 
facilitate our effort at developing novel classification schemes for lung cancer and the 
identification of novel markers for early diagnosis. 
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1 Introduction 

There is substantial interest in implementing novel and 
comprehensive strategies for the molecular analysis of 
tumors and relevant biological fluids. We have implement- 
ed a strategy for the molecular analysis of lung cancer 
that integrates genomic analysis using genome scanning 
procedures, transcriptomic analysis using cDNA and 
oligonucleotide microarrays, and proteomic analysis. For 
the latter, we have relied to date primarily on 2-D poly- 
acrylamide gels. However the 2-D gel approach is being 
increasingly complemented with additional analyses 
using liquid based protein separations and protein micro- 
arrays. While on the one hand proteomic analysis com- 
plements genomic analysis for a global assessment of 
gene expression, on the other hand proteomic analysis 
uniquely contributes an understanding of protein post- 
translational modifications and the location of protein 
gene products in subcellular compartments. The scope 
of our overall molecular analysis study of lung cancer is 
shown in Fig. 1 . Important objectives include the develop- 
ment of novel molecular classification schemes for lung 
cancer and the identification of novel markers for the early 
detection of lung cancer. 

The large body of proteomic and other data we have col- 
lected has necessitated the construction of a database in 
which basic and derived data is organized. There have 
been relevant related efforts at databasing of 2-D data 
by other groups. One such database is the 2DWG Meta- 
database of 2-D gel images, which contains 2-D derived 
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data acquired by a combination of review of results as 
well as submissions by investigators [1]. However, to 
date there are only three entries found matching the query 
for human lung images in the 2DWG Web Gel Meta-data- 
base web site (http://www-lecb.ncifcrf.gov/2dwgDB). 
The database we have constructed, in its entirety, is rele- 
vant to a variety of cancers. However the focus of this 
review is the use of the database to achieve our objec- 
tives related to the molecular analysis of lung cancer 
specifically. The goal of the database is to facilitate 
planned analyses, i.e. statistical analysis, as well as 
post-planned analyses, i.e. data mining. The intent is to 
make the database queryable on a protein - by - protein 
basis as well as through subgrouping of samples ana- 
lyzed, in a menu driven fashion. Internet and WWW tech- 
nologies are used not only to allow investigators to view 
visual and textual data together, but also to allow investi- 
gators in other locations to retrieve archival data using 
different computer systems. 



2 Laboratory information processing 
system 

A long-standing Laboratory information processing sys- 
tem (LIPS) developed by our group [2] has been adapted 
for our database. LIPS consists of multiple systems and 
processes. Avarietyof data is stored in a variety of formats 
with individualized programs for viewing the data. Typical 
processes using LIPS include: sample inventory; digitize 
images; detect and quantify spots; match spots and nor- 
malize spot sizes across images, choose spots for MS 
analysis, enter profiles from MS-Fit web search; transfer 
data to statistical software or spreadsheets. 

Data tend to be complex and dynamic in that their con- 
tents are ever changing as information is added, modified 
or removed. Simple or intensive analyses of 2-D patterns 
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have produced a large amount of data. Data is both tex- 
tual (e.g., reports and numbers) and visual (e.g., 1-D and 
2-D gel images). 

Some types of data generated by LIPS include: 2-D pro- 
tein gel images (silver, modified silver, blots, ^ labeled 
gels); genome scans; 1-D gel images; spot information- 
protein names; gene information from DNA microarrays; 
MS files and MS-Fit reports (Word documents); figures 
(Raster files on the Sun and actual photographs); data 
from protein microarrays; data from liquid chromato- 
graphy separations. 

However, as computer technology has evolved, quantum 
jumps in improvements in organizing unstructured, scien- 
tific data into a structured database have become possible. 
A major function of our database and its interfaces is to 
serve as a computer-based tool for capturing the basic 
quantitative data from 2-D gel images and derived data 
and findings derived from different studies about proteins 
detected in 2-D patterns of various tumor types [3]. As a 
result, investigators are provided with easy access to data 
as well as a means for intelligent data mining of the existing 
data. A logical view of the database schema is shown in 
Fig. 2 and a list of tables and their attributes are shown in 
Table 1. 

The following are important features of the 2-D gel related 
component of our lung protein database: 

(1) All 2-D gel images are placed in hierarchies so that: (a) 
every study image is matched to one master image, /.e. all 
lung adenocarcinoma tumor images are matched to one 
master image; (b) every master image is matched to at 
most one (higher) master image, i.e. all masters for differ- 
ent lung tumor types are matched to one tumor master. 



This allows the database to have an indexing mechanism 
that can relate a spot to any gel in the hierarchy. The data- 
base provides a capability to access the basic and 
derived data using the following types of queries: (a) given 
a spot on any gel, find all spots that are matched to it; (b) 
given a spot on any gel, find all protein identifications 
made for it, and (c) given a spot on any gel, find all find- 
ings/conclusions that are linked to it. 

(2) All samples (and thereby gels derived from them) are 
identified by a list of source characteristics in four major 
categories: experiment code; cell type code; treatment 
code; and fraction code. This allows the database to 
have an identification mechanism that can relate a gel to 
any source in the hierarchy. The database provides a cap- 
ability to find all images as follows: (a) given a category, 
find all images that have the same value of the category; 
and (b) given any combination of four categories, find all 
images that satisfy the condition. 

(3) All protein spots are identified by a list of characteris- 
tics in four major attributes: protein name; p/ and M T \ 
accession number; and protein sequence data. A spot 
may have several findings and there may be many kinds 
of findings derived from a particular study. If possible the 
findings are recorded in a consistent way, however this is 
not always possible due to some characteristics of such 
findings (e.g., statistical analysis matrices, MS data, and 
Affymetrix data). As the number of studies has increased, 
the amount of data produced has increased. Some of the 
data (e.g. mass spectra and Affymetrix (Santa Clara, CA, 
USA) oligonucleotide chip readouts) is very large, and fills 
up the hard disks of the computers where it is collected. 
Such data is generally saved on CD-Rs, and only the most 
recent data is kept in a computer. It is sometimes easier 
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Figure 2. A logical view of data- 
base schema. 



Table 1 . A list of tables and their attributes in the lung protein database 



Table name 


Unique identifier 




(Primary Key) 


Project 


Project Name 


Sub Project 


Sub Project Name 


Subject 


Subject ID 


Tissue Sample 


Tissue Sample ID 


DNA Sample 


DNA Sample ID 


Gel 


Gel ID 


Image 


Image Name 


Spot 


Image Name & Spot No 


Match , 


Match ID 


Experiment 


Experiment Code 


Cell Type 


Cell Code 


Treatment ;/ 


Treatment Code 


Fraction '■ 


. Fraction Code 1 


Protein Sample 


. Sample ID 


Protein 


Protein Name 


Other Link . 


Protein Link ID 


Findings 


Image Name & Spot No 


-Protein Identification 


Image Name & Spot No 



List of attribute types 



Project Type, Description 

Date Started, Comment 

Case No, Sex, Birthdate, Comment 

Tissue Type, Diagnosis, Date Sample Taken, Date Received, How 

Received, Source, Comment 
Date Produced, Concentration, Freezer Location, Comment 
Sample ID, Batch ID, Enzyme Combination, Electrophoresis 

Process, Comment 
Date Imaged, Exposure Time, Image Type, Image Location, 

Comment 
X, Y, Intensity, Spot Type 

Master Image Name, Master Spot No, Image Name, Spot No 

Description 

Description 

Description 

Description 

Experiment Code, Cell Code, Treatment Code, Treatment Date, 
Fraction Code, Comment, Project Type, Gel ID, Image Name, 
Image Type, Researcher . - 

Image Name, Spot No 

Protein Name, Database Name, URL 

Category, Designation, Finding 

Accession No, cDNA cloning, Cell Lines, Facility, Date, 
Genomic Cloning, Glycosylate, M r , p/, Phosphorylation, 
Phosphorylation Residues, Related Spot, Sequences, 
Source of Protein, Name, Structural Modification, Subcellular 
Localization, Tissue Distribution, Type of Membrane, 
Type of Sequencing 
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to post individual files on the web. Individual web pages 
have been created with textual and visual data that are 
difficult to relate in a table. This allows investigators an 
opportunity to analyze 2-D gel and other images contain- 
ing spots that have not been detected or identified and to 
compare data across studies. In addition this is used to 
link our data to other biological knowledge repositories 
such as GenBank, PIR International, and SWISS-PROT 



3 Contents of the lung cancer protein 
database 

A large number of studies involving lung cancer have been 
independently performed in the laboratory. At the protein 
level, these studies have resulted in 1349 images, over 
1000 of which are images of 2-D gels for which information 
has been recorded in the lung protein database. This num- 
ber represents a fraction of the 30 682 2-D gels produced by 
our group for different studies, which include studies of other 
cancer types encompassing head and neck, esophagus, 
liver, colon, pancreas, ovary, breast, prostate, brain, leu- 
kemias and childhood tumors. A list of protein gel images 
related to lung studies is shown in Table 2. White lung adeno- 
carcinomas represent a major portion of the database, 
other lung tumor types including squamous cell carcinomas 
and small cell lung cancers are represented, as are control 
lung tissues. Other 2-D patterns were produced from 
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Table 2. A high-level categorization of lung protein 2-D 
images by sample type 



Lung Sample Types 



Cell Lines 


421 


Cystic Fibrosis 


44 


Tumor 


635 


Normal 


170 


Plasma 


61 


Other 


18 


Total 


1349 



studies of cell lines that have been manipulated by trans- 
fection or by treatment with specific agents, as well 
as patterns produced after different cell fractionation 
schemes. Substantial emphasis is currently being placed 
on the comprehensive profiling of lung cancer derived 
surface membrane proteins. 

Mass spectrometry and/or AMerminal sequencing of pro- 
tein spots from 2-D gels of lung tumor samples or cell 
lines have led to the identification of a large number of 
proteins expressed in lung cancer. Also, most identifi- 
cations made for proteins from a sample type can often 
be confidently transferred to matching protein spots on 
master images from lung studies. Table 3 and Fig. 3 ex- 
hibit some of the progress we have made in identifying 
proteins in 2-D gels of lung samples. 




Figure 3. Small cell lung tumor 
master containing identified pro- 
teins. 
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Spot# 


NCBI 

Accession 
Number 


GenBank 
Number 




> 


Lwiciai 

yd lo 














577 


398953 


P31947 


4.361 


30.052 


SFN 


615 


112695 


P29312 


4.569 


29.101 


YWHAZ 


1279 


437363 


AAA35483 






YWHAH 


24 


2507178 


P16118 






PFKFB1 


928 


4502201 


NP_001649 


6.31 


20.697 


ARF1 


319 










ALB 


800 










ALB 








5.957 


70.244 


ALB 


207 


4502031 


NP_000680 


6.811 


56.966 


ALDH1A1 


543 


3493209 


AAC36469 


7.812 


32.379 


a i/rn o i r\ 
AKH1 D1U 


14 


1 30737 


P05187 


5.86 


57.954 


ALPP 


666 










ALB 


693 










ALB 


332 


4503571 


NP_001419 


7.742 


45.407 


EN01 


268 


8272482 


AAFZ4221 






HCR 


268 


5360901 


BAA82158 








172 


5174477 


NP_006073 


5.099 


52.848 




802 






4.796 


17.194 




460 


113944 


P04083 


6.73 


39.264 


ANXA1 


522 


4502107 


NP_001145 


4.83 


33.326 


ANXA5 


685 






5.124 


25.4 




1278 


71967 


LNHUPS 








795 






6.33 


18.909 


ARF1 


96 


227920 


171 341 OA 


5.34 


14.584 


LGALS1 


349 


113270 


P02570 


5.29 


41.7 


ACTB 


61 


29497 


X59511 






SPTB 


229 


4507729 


NP_001060 


4.75 


49.8 


TUBB 


22 


11995077 


AB038211 








104 


4757900 


NP_004335 


3,668 


Of 




469 






3.442 


48.772 




36 


4929561 


AAD34041 


6.25 


49.296 




149 


4502643 


NP_001753 


7.034 


60.547 


CCT6A 


1338 


4502899 


NP_001824 






CLTA 


789 










COL15A1 


85 


1362772 


E57233 






CPLX2 


856 






5.415 


11.858 


CRABP2 


855 


4506451 


NP_002890 


4.667 


10.297 


RBP1 


439 


180570 


AAC31758 


5.34 


42.618 


CKB 


872 






4.568 


9.2 




321 


1673575 


U76549 






KRT8 



ID 

Source 



Name 



L95 



LM 
LM 
L95 

DMS 79 



LM 
LM 
LM 

SKMES 



LM 
L95 
L95 

LM 

LM 

A549 

LM 

LM 

L95 

LM 

SKMES 
LM 

DMS 79 
LM 

DMS 79 

LM 

LM 

A549 

A549 

L95 

DMS 79 
LM 

LM 



LM 
A549 



(spot 1496L) possibly 
pacreatitis-associated 
protein 

1 4_3_3_sigma 

14_3_3_ZetaDelta 

14-S-3n 

6PF-2-K/FRU-2.6-P2ASE 

Liver isozymer 
ADP-ribosylation factor 1 
Albumin 
Albumin 
Albumin 

Aldehyde Dehydrogenase 
AldoKeto Reductase 
Alkaline Phosphatase, 

Placental type 1 precursor 
Albumin 
Albumin 
a-Enolase 
a-helical protein 
cc-helix coiled-coil rod 
homolugue 
a Tublin 
Amyloid B4A 
Annexinl 
Annexin V 
ApoAI 

Apoprotein, pulmonary 

surfactant 
ARF1 

p-Galactoside soluble lectin 

|}-Actin 

p-spectrin 

(5 Tubulin 

Calmodulin dependant 
phosphodiesterase 

Calreticulin 

Calreticulin32 

CGI-46 protein 

Chaperonin-like protein 

Clathrin light chain A 

Collagen, type XV, a 1 

Complexin tl 

Cellular retinoic acid- 
binding protein 2 

Cellular retinol-binding 
protein 1.CRPB1 

Creatine kinase, brain 

Cytochrome C bxydase VA 

Cytokeratin 8 
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Table 3. Continued 

, D Name Spot# NCBi GenBank p/ M t Official 

Source Accession Number gene 

' Number 



A549 


Cytokeratin 8 


446 


2506774 


P05787 


5.52 


53.674 


KRT8 


A549 


Cytokeratin 8 


439 


2506774 


P05787 


5.52 


53.674 


KRT8 


LM 


Cytokeratin 15, keratin 15 


289 


4504915 


NP_002266 


4.153 


49.261 


KRT15 


A549 


Dihydrolipoamide dehydro- 
genase, mitochondrial 


759 


118674 


P09622 










precursor 










21.015 


DJ1 


LM 


DJ1 


811 


6005749 


NP_009193 


6.44 


U IVI 


DJ1_MER5 


700 






6.263 


24.001 




DMS 79 


dj475N16.1 (CTG4A) 


57 


6969163 


CAB75301 




20.136 




L M 


DUTPhase 


769 






5.719 


HIP2 


Lao 


E2 ubiquitin-conjugating 


1445 


4885417 


AB022435 








enzyme 










22.961 




LM 


EIF4d 


718 






5.104 




LM 


EIF5A 


839 






4.599 


10.957 


ERH 




Enhancer of rudimentary 


902 












(Drosophila) homolog 










47.286 


EN02 




"~ Enolase 2 (y, neuronal) 


295 


119347 


P09104 


4.94 


LM 


FNPI HSP100 


18 






4.945 


78.717 


ATP5JD 


AS4Q 


F1 FO-type ATP synthase 


1519 


5453559 


NP.0063475 


5.21 


18.491 




subunit d 












CCNE1 


DMS 79 


G1/S specific cyclin E1 


31 


3041657 


P24864 




31.772 


LM 


G3PD 


540 






7.457 


ACTG1 


LM 


y-Actin 

OfyUAaldOw I 


348 


113278 


P02571 


5.146 


42.315 


L IVI 


650 


417246 


Q04760 


4.833 


25.572 


GL01 


rML) ( o 


firam ilnm/tA-maprnnhsne 

Ol dl lUIUw/ it? 1 1 laui \J\Ji i«yc 

colony-stimulating factor 


86 


117561 


P04141 






CSF2 




precursor 










73.124 




LM 


GRP75 


87 






5.9341 




LM 


GRP78 


79 






5.187 


68.109 


GSTP1 


LM 


GSTpi 


690 


726098 


AAC13869 


5.5 


25.4 


Heat shock 27 kD protein 1 


626 


123571 


P04792 


7.83 


22.327 


HSPB1 




Heat shock 27 kD protein 1 


631 


123571 


P04792 


7.83 


22.327 


HSPB1 


A549 


Heterogeneous nuclear 


457 


5031753 


NP_005511 






HNRPH1 




ribonucleoprotein H 










36.558 




A549 


HLA-B71 or HLB-B71 
variant 


818 


511776 


U11269 


5.55 




LM 


HSC70_HSP73 


120 






5.893 


72.429 




LM m 


^ HSP90 


46 






5.276 


76.096 




L95 1 


* HSPC089 


1036 


6841118 


AAF28912 








L95 


HSPC321 


1547 


6841292 


AAF28999 








L95 


HSPC321 


1548 


6841292 


AAF28999 








A549 


HuCha 60 SP 60 


181 


4504521 


NP_002147 


5.7 


61 


HSPD1 


L95 


Huntingtin associated protein 


1595 


1708113 


P54255 






HAP1 


L95 


Huntingtin associated protein 


1548 


1708113 


P54255 




54.908 


HAP1 




Intemexin neuronal intermediate 1 83 


6225015 


Q16352 


5.48 


IN A 




filament protein, alpha 












KRT17 




Keratin 17 


934 


4557701 


NP_00413 


4.97 


48.106 


DMS 79 


KIAA1 610 protein 


26 


10047295 


AB046830 




44.03 




LM 


LamR 


340 






4.549 
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Table 3. Continued 



ID 


Name 


Spot# 


NCBI 


GenBank 


P/ 




Official 


Source 






Accession 


Number 






gene 






Number 












i pr-tin aalactoside-bindina, 


873 


227920 


171 341 OA 


5.34 


14.584 


LGALS1 




cnlnhlp 1 fnalpetin 1^ 














L rvi 


LipUOUi III I 


460 


113944 


PO4083 


6.73 


39.264 


ANXA1 


A549 


L-Lactate Dehydrogenase • 
H rhain 

1 1 ul lull 1 


906 


126041 


P07195 






LDHB 


A549 


L- lactate oenyarogenase 
H chain (LDH-B) 


906 


4557032 


NP_002291 






LDMB 


L M 


Lam in B 


924 






5.787 


69.625 






Lymphocyte cytosolic 








5.20 


70.290 


LCP1 




protein i (L-piasxin; 












PSMAS 


L95 


Macropain subunit zeta 


1 lift 


A^nfii ft7 


MP fin?? 89 






DMS 79 


mho class 1 
histocompatability 


33 


1236790 


U06487 










antigen protein 










30.239 




DMS 79 


Multicalaytic endopeptidase 
comples chain C2, 


74 


346314 


JC1445 


6.51 






— long splice from 










15.172 




LM 


MyosinLightCahin3 


815 






4.11 




A549 


Nm23, NDPKA 


1456 


127981 


P15531 


5.809 


19.216 






Non metastatic cells 1 , 


793 


4557797 


NP_000260 


5.83 


17.148 


NME1 



LM 

LM 
LM 
LM 
LM 
LM 
L95 
L95 
L95 



L95 
A549 

A549 



A427 



L95 



L95 
L95 



protein (NM23A) 
Op 18, leukemia-associated 

phosphoprotein p18 (stahmin) 
Op 18a 
Op 18m 

Phosphoglycerate MutB 
Phospholipase C 
PIMT 

Pinch-2 protein 
Pinch-2 protein 
Possibly activin type II 

receptor precursor; 

DNA polymerase epsilon 

subunit B; or ITF-1 DNA 

binding protein 
Possibly BTF2p44 
Possibly carbonci anhydrase III 

or UCH-L1;PGP9.5 
Possibly 6-3,5 5-2,4- 

Dienoly-CoA isomerase 

precusor 
Possibly G1 to S phase 

transition protein; serine- 

theonine phosphatease 

protein; or phosphatase 

5 protein 
Possibly GCF2 fusion 

protein or Bamacan 

homolog 
Possibly glycosyltransferase 
Possibly HLA DQ 



809 

807 

808 

639 

248 

662 

1695 

1825 

627 



1496 
1242 

2138 



321 



320 



1519 
1271 



5031851 

5031851 
5031851 



9800509 
9800509 



NP 005554 5.783 



17.164 LAP18 



NP.005554 
NP.005554 



AAF99328 
AAF99328 



4.962 

5.302 

7.083 

5.7 

6.211 



13.655 

14.857 

27.227 

56.5 

25.804 



LAP18 
LAP18 
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IP N am e Spot# NCBI GenBank pi M r Official 

Source Accession Number gene 

Number 



A549 Possibly hydroxyacylglutathione 1080 

hydrolase or B-lymphocyte 

Antigen CD20 
L95 Possibly microtubule-based 1438 

motor protein 
L95 Possibly putative novel 1427 

protein similar to HPS 
L95 Possibly Spi-B; unnamed 1 1 87 

protein product (AK00 1844); 

or protein kinase (y15801) 
L95 Possibly T-complex protein 630 

A549 Possibly U 1 small nuclear 1 1 48 

ribonuclear protein A 



L95 


Possibly unnamed protein 
product (AK000369) or 
syntaxin 


1064 












L95 


— Possibly unnamed protein 
product or Pro0282p 
protein 


1351 














procollagen-proline, 


110 


^5074dU 


DA7007 


A 7C 


Of . I i D 


D/1UD 

rnrlD 




2-oxoglutarate 4-dioxgenase 
















(proline 4-hydroxy!ase) f beta 
















polypeptide (protein disulfide 
















isomerase; thyroid homone 
















bindung protein p55) 
















proliferating cell nuclear 


O I o 




pi 7070 


44 


37.5 


PCNA 




antigen 
















Protein phosphatase 2 


104 


5915686 


P30154 


4.84 


66.202 


PPP2R1B 




(formerly 2A), regulatory 
















subunit A (PR 65), p-isoform 














LM 


Protein H precursor 


40 






3.714 


62.182 




LM 


Protein kinase C inhibitor 1 


882 


4885413 


NP_005331 


7.714 


11.521 


HINT 


L95 


Pulmonary surfactant 
apoprotein precusor 


1278 


190565 


AAA36510 






SFTPA1 


L95 


Pulmonary surfactant- 
associated protein 


1278 


131412 


P07714 






SFTPA1 


LM 


R33729_1 


848 


3355455 


AAC27824 


7.508 


13.163 






Retinol-binding protein 1 , 


855 


4506451 


NP_002890 


4.99 


15.850 


RBP1 




cellular 














LM 


RoSS__A_Antigen 
S100 calcium-binding 
protein A11 (calgizzarin) 


69 
906 






3.215 


47.903 


S100A11 




S1 00 calcium-binding 


910 


115442 


PO5109 


6.51 


10.834 


S100A8 




protein A8 (calgranulin A) 
















S1 00 calcium-binding 


931 


6094219 


P50117 


6.37 


13.291 


S100A9 




protein A9 (calgranulin B) 














DMS79 


Serine/threonine protein 
phosphatase 2A, 65kDa 
regulatory Subunit A, 
P isoform 


14 


5915686 


P30154 


4.84 


66.202 


PPP2R1B 




SET translocation (myeloid 


376 


1711383 


G01105 


4.12 


32.103 


SET 




leukemia-associated) 
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ID 


Name 


Spot# 


NCBI 


GenBank 


P/ 




Official 


Source 






Accession 


Number 






gene 






Number 












Small glutamine-rich 


476 


8134666 


043765 


4.81 


34.063 


SGT 




tetraricopeptide repeat 
















(TPR)-containing 












SFN 




Stratifin 


577 


398953 


P31947 


4.68 


27.774 


LM 


Superoxidedism CuZn 


792 


134611 


P00441 


5.6 


17.3 


SOD1 


LM 


Superoxide DismMN, 
superoxide dismutase 2, 
mitochondrial 


737 


134665 


P04179 


7.887 


20,78 


SOD2 


LM 


TCP 1 P subunit 


202 






5.89 


59.841 




LM 


TCTP (translationaliy-controlled 


680 


4507669 


NP_003286 


4.688 


25.143 


TPT1 




tumor protein 1) 








4.689 






LM 


TTiioredoxin 


896 








LM 


Tplastin HSP70 


125 






C DCO 






LM 


Transthyretin 


842 






C CQQ 

o.oyo 


1 A 71 A 


TPI1 


A549 


Triosephosphate isomerase 


672 


136060 


P00938 


7.2 


25.5 


L95 


Tropomyosin, cytoskeletal 


550 


136096 


rn ooo a 
P12o<c4 




11 Q 






type, tropomyosin 5 










32.733 


TPM4 


LM 


Tropomyosin 4 


548 


13274400 


AAK17926 


4.377 


L95 


Troponin T 


866 


408217 


AAB27731 








L95 


Troponin T 


778 


408217 


AAB27731 






TUBB 




Tublin, p polypeptide 


229 


4507729 


NP.001060 


4.78 


49.907 


DMS 79 


Tumor associated hydroquinone 34 


6644167 


AF207881 










(NADH) oxidase tNOS 












YWHAE 




tyrosine 3-monooxygenase/ 


576 


1168198 


P4266 


4.63 


29.174 




tryptophan 5-monooxyge- 
















nase activation protein, 
















epsiion polypeptide 










26,645 


YWHAZ 




tyrosine 3-monooxygenase/ 


615 


112695 


P29312 


4.73 




tryptophan 5-monooxy- 
















genase activation protein, 
















. zeta polypeptide 
















Tyrosine 3-monooxygenase/ 


579 


112690 


P27348 


4.68 


27.764 


YWHAQ 




tryptophan 5-monooxy- 
















genase activation protein, 
















theta polypeptide 












UCHL1 


A549 


Ubiquitin carboxyl-terminal 
esterase L1 (ubiquitin 
thiolesterase), UCH-L1; 
PGP 9.5, GST mu 


656 


136681 


P09936 


5.283 


27.745 


L95 


Unnamed protein product 


1270 


7023092 


BAA91833 




31.263 




A549 


Urokinase plasminogen 
activator 


842 


487123 


S39495 


6.01 




LM 


Vid1 


293 






4.712 


47.485 




LM 


Vid2 


294 






4.614 


46.369 




LM 


Vid4 


337 






4.464 


45.322 




Vimentin 


294 










VIM 


A427 


Vimentin 


606 


4507894 


NMJD03380 






VIM 


A549 - 


Vimentin 


505 


418249 


PO8670 






VIM 


A549 


Vimentin 


47 


340234 


M25246 






VIM 
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In addition to 2-D gel analysis, most lung adenocar- 
cinomas are examined at the genomic level using restric- 
tion landmark genome scanning, and by mutation analy- 
sis for a small number of genes. Transcriptomic analysis is 
done primarily using oligonucleotide microarrays, as part 
of our efforts to derive a molecular based classification of 
lung adenocarcinomas that is more predictive of clinical 
behavior for this group of tumors than current classifi- 
cation schemes. We also have similar molecular analyses 
of control lung tissue- obtained from multiple sources 
including adjacent lung tissue from lung cancer patients 
as well as tissues obtained from non-cancer resected lung. 

Only a fraction of the information in the 2-D patterns has 
been linked across all studies and analyses. The lung pro- 
tein database contains the basic descriptive data of var- 
ious samples analyzed, the images of the 2-D patterns 
that resulted from these samples, the quantitative spot 
data and information about which spots have been 
matched to each other, and conclusions or findings about 
spots. The„database is intended to allow not only the 
retrieval of existing data, but also to mine new information 
and knowledge about protein expression in lung cells. 
Data mining activities consist, for example, of reviewing 
previous studies and finding out which 2-D gel patterns 
and protein spots are interesting for post-planned analy- 
sis and new discoveries. Such discoveries derive from: 
(1) identification of proteins that exhibit interesting expres- 
sion profiles in 2-D patterns that have been regrouped 
from different experiments and studies; (2) expanded 
statistical analyses that cover protein expression patterns 
involving large numbers of experiments and images; 
(3) relating our data involving proteins to outside informa- 
tion; and (4) relating proteomic data to genomic data. 



4 Use of the database for post planned 
analysis 

4.1 Virtual matching 

Interactive software packages are used to automatically 
detect and quantify spots and to match spots between 
different protein patterns, with visual editing to correct 
any errors in computer based matching. The spot match 
program has created indices that allow investigators to 
quickly navigate through many gels and easily compare 
spots on images from many different experiments and 
studies, discover proteins of interest, and access and 
view relevant data. Here the term "match" is used as a 
logical "transitive" relation, which means if spot A is 
matched to spot B and spot B is matched to spot C 
then the spots A and C are considered matched. The 
lung protein database contains data on proteins detected 



on various 2-D gels. Since all gels derived from whole 
cell or tissue lysates in the lung protein database are 
tied into a single hierarchy, protein identification data 
recorded for a spot is used to derive protein data for its 
matched spots using an advanced query capability of 
the database. This is known as "virtual matching" or "vir- 
tual protein identification", which allows investigators to 
access and view all matched images and the corres- 
ponding information from the lung protein database. 
With a click on a spot, one gets the result shown in 
Fig. 4. The virtual protein identification feature does not 
provide a 100% level of certainty of protein identification, 
but it makes possible the display of spots of interest. A 
combination of automated recognition and manual edit- 
ing generally yields an accurate record of protein infor- 
mation in the database for previously unknown proteins. 
With this approach, the lung protein database will evolve 
and mature to include all correct data for further analysis 
and data mining. 

4.2 Integrating protein spot data with MS data 

As interest in proteomic analysis grows, a number of very 
large public databases are available to access protein, 
data via the internet. Public databases offer a sophis- 
ticated text search and keyword search, which links any 
entered keyword to all protein information associated with 
that keyword, to ensure easy access to all relevant data. 
Protein identification using MALDI-MS relies on database 
searches and usually has three components: (1) peak 
detection which allows automatic determination of pep- 
tide masses; (2) search in protein sequence databases 
(SWISS-PROT and/or GenBank) for protein entries that 
match the masses; and (3) certainty calculation which 
determines the quality of the match for each protein in 
the list [4]. An example of such a software tool is the Pep- 
Frag for searching protein and DNA sequence databases 
that can use different types of mass spectrometric infor- 
mation [5]. Fenyo [6] described methods and software 
tools in proteomics for identifying and characterizing pro- 
teins, which emphasizes MS combined with database 
searching. Proteolytic peptide mapping and genome 
database searching provide an automated means for 
identifying proteins, and the certainty of the results is 
computed by the number of masses matched for each 
protein [7]. Another useful tool is FindMod (http://www. 
expasy.ch/sprot/findmod.html)forthe systematic charac- 
terisation of proteins using mass spectrometry [8]. 

We have created MS data forms that contain information 
used in mass spectrometry queries, summary information 
(Rank, MOWSE score, % Masses Matched, MW, p/, Spe- 
cies, Accession #, Protein Name) and additional informa- 
tion (Summary ID, Submitted Mass, Matched Mass, Delta 
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Figure 4. Virtual protein identification by clicking a spot. 



PPM, Start, End, Peptide Seq, Modifications, Unmatched 
Masses). An example of the MS data form is shown in 
Fig. 5. Integrating the lung protein database with MS 
data provides a record of protein identification and high 
level of integration with other public databases, although 
substantial effort is required for data collection. We are 
currently evaluating an automated or semi-automated 
method of pulling these data when new information, 
which is relevant to our objectives, is available. 



4.3 Integrating protein data with microarray data 

As technology evolves, new computer aids and methods 
are introduced for genomic analysis as well as proteo- 
mic analysis. With respect to DNA microarray platforms, 
a current goal is to construct lung specific cDNA micro- 
arrays for lung cancer investigations. In the meantime 
RNA expression data for lung cancer is being collect- 
ed using an Affymetrix oligonucleotide based system. 
This system automates the identification and quantifica- 
tion of microarray spots. Data files contain integrated 
intensities for each spot and ratios showing fold changes 
per spot. The use of oligonucleotide based microarrays 
for RNA analysis in lung cancer by our group has resulted 



in a massive amount of data. Integration of protein infor- 
mation in the lung protein database with microarray data, 
allows us to extend data analysis capability to encompass 
RNA and protein data for a subset of genes. 



5 Some findings derived from the lung 
cancer protein database 

5.1 Unique proteomic pattern of small cell 
lung cancer 

A major goal of our proteomic and genomic studies of 
lung cancer is to derive novel classification schemes that 
have utility in making a diagnosis, predicting outcome and 
in making therapeutic decisions. An important first step in 
this direction is to determine the ability of proteomic pro- 
filing to distinguish between known types of lung cancer. 
Specific protein differences between different types of 
cancer have been identified by other groups. In a recent 
study of breast, ovary and lung tumors, 20 differentially 
expressed proteins were identified [9] and in a prior study, 
1 6 polypeptides were found to be associated with differ- 
ent histopathological features of lung cancer [10, 11], In a 
study of 25 adenocarcinomas of the lung, 12 small cell 
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Figures. MS data form 



lung cancers, and 1 6 squamous cell tumors, by our group 
(manuscript submitted) an initial analysis of protein 2-D 
patterns uncovered a group of 52 protein spots that dif- 
fered in average integrated intensity between the three 
groups. Performing simple two-sample f-tests gave p 
values of less than 0.05 for the 52 spots for at least one 
of the pairs of groups. Most of the spots differed between 
small cell and the remaining two diagnostic groups, with 
47 spots differing significantly between small cell and 
adenocarcinoma groups and 44 between small cell and 
squamous (p<0.05). Between the adenocarcinoma and 



squamous groups 12 spots with difference of this signifi- 
cance were found. Summary data for some of the spots is 
presented in Table 4. The first two principal components 
of the data are graphed in Figure 6, and show that as a 
group the spots distinguish small cell tumors from the 
other two tumor types fairly easily. 

We have identified 39 of this set of 52 spots by either 
A/-terminal sequencing and/or MS of spot digests. Small 
cell lung cancers were characterized by higher average 
amounts for some proteins associated with cell prolifera- 
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Table 4. 39 identified protein spots found to differ between small cell, adenocarcinoma, and squamous tumors of the lung 
(n = 12, 25, 16). In the f-test columns are p values from the two-sided two-sample f-test comparing each pair of 
groups 



0|AH 


Mninpnp rlPQPrintinn 


Offical 


Mean 


Mean 


Mean 


f-test 


f-test 


f-test 


# 




gene 


adeno- 


squa- 


small 


adenocar- 


small cell 


adenocar- 






symbol 


car- 


mous 


cell 


cinoma vs 


vs squa- 


cinoma vs 








cinoma 






small cell 


mous 


squamous 


294 


vimentin 


VIM 


1.36 


1.16 


0.53 


0.010 


0.016 


0.509 


319 


albumin 


ALB 


2.13 


1.67 


0.73 


0.001 


0.005 


0.231 


666 


albumin 


ALB 


0.72 


0.59 


0.20 


0.002 


0.030 


0.461 


800 


albumin 


ALB 


2.34 


1.80 


0.63 


0.010 


0.034 


0.383 


873 


lectin, galactoside-binding, soluble, 1 


LGALS1 


1.95 


1.69 


0.83 


0.000 


0.002 


0.310 




(galectin 1) 
















928 


ADP-ribosylation factor 1 


ARF1 


0.22 


0.19 


0.06 


0.012 


0.046 


0.607 


522 


annexin A5 


ANXA5 


0.46 


0.26 


0.39 


0.429 


0.202 


0.012 


515 


proliferating cell nuclear antigen 


PCNA 


0.15 


0.18 


0.36 


0.002 


0.011 


0.464 


577 


strattfin 


SFN 


0.78 


1.39 


0.41 


0.129 


0.002 


0.029 


626 


heat shock 27 kD proteinl 


HSPB1 


0.87 


1.18 


0.30 


0.000 


0.002 


0.128 


631 


heat shock 27 kD proteinl 


HSPB1 


1.04 


1.35 


0.46 


0.003 


0.017 


0.277 


793 


non- metastatic cells 1, protein (NM23A) 


NME1 


0.36 


0.43 


0.59 


0.003 


0.033 


0.253 


OU/ 


loi itomi'j_5ccA^iatP(i nhrwrnhnnrntpin n1fi 
icUKcllfld-aocKA/iaicu (JiJUo^MupiuiciJi pio 


LAP 18 


0.03 


0.05 


0.92 


n nnn 


o, nnn 

U.UUU 


U.OJ 1 




^SlaUllTlin; 
















809 


leukemia-associated phosphoprotein p1 8 


1 AD1Q 

LAr lo 


ft 


n eft 


0 QQ 
0.00 


U.UUU 


ft ftftft 

U.UUU 


ft 70O 




(stathmin) 
















931 


S100 calcium-binding protein A9 


D 1 UUMy 




I. lo 


\J.£.H 


ft noc 


n ftfti 
U.UU1 


ft AA~t 

U.44/ 




/ratnraniilin R\ 
lUllil D) 
















IU4 


protein pnuoprtduioc c ^luiinciiy 


PPP9R1R 
rrrtn i o 


n 17 


n 1? 

U. 1 o 


U.uJ 


n nnn 

U.UUU 


n nm 

U.UU 1 


H ISA 
u. too 




regulatory 


















subunit A (PR 65), beta isoform 
















110 


procollagen-proline, 2oxoglutarate 4-dioxy- 


P4HB 


0.10 


0.10 


0.30 


0.014 


0.049 


0.906 




genase (proline 4-hydroxylase) beta 


















polypeptide (protein disulfide isom erase; 


















thyroid hormone binding protein p55) 
















183 


intemexin neuronal intermediate filament 


INA 


0.04 


0.04 


0.16 


0.000 


0.000 


0.751 




protein, alpha 
















229 


tubulin, beta polypeptide 


TUBB 


0.14 


0.27 


0.83 


0.000 


O.OOO 


0.028 


289 


keratin 15 


KRT15 


0.36 


0.29 


0.65 


0.028 


0.009 


0.343 


295 


enolase 2, (gamma, neuronal) 


EN02 


0.10 


0.23 


0.39 


0.000 


0.065 


0.026 


376 


SET translocation (myeloid leukemia- 


SET 


0.25 


0.17 


0.71 


0.000 


0.000 


O.031 




associated) 
















439 


creatine kinase, brain 


CKB 


0.11 


0.05 


0.16 


0.033 


0.000 


O.004 


460 


annexin A1 


ANXA1 


0.43 


0.42 


0.59 


0.014 


0.026 


0.691 


476 


small glutamine-rich tetratricopeptide 


SGT 


0.16 


0.19 


0.33 


0.000 


0.000 


0.241 




repeat (TPR)-containing 
















576 


tyrosine 3-monooxygenase/tryptophan 


YWHAE 


0.40 


0.38 


0.82 


0.000 


0.001 


0.697 




5-monooxygenase acctivation protein, 


















epsilon polypeptide 
















579 


tyrosine 3-monooxygenase/trypthophan 


YWHAQ 


0.52 


0.55 


0.91 


0.000 


0.006 


0.703 




5-monooxgenase activation protein, 


















theta polypeptide 
















615 


tyrosine 3-monooxygenase/ 


YWHAZ 


0.93 


1.09 


1.79 


0.000 


0.003 


0.336 



tryptophan 5-monooxygenase 
activation protein, zeta polypeptide 
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Spot Unigene description Offical Mean Mean Mean f-test f-test f-test 

# gene adeno- squa- small adenocar- small cell adenocar- 

symbol car- mous cell cinoma vs vssqua- cinoma vs 

cinoma small cell mous squamous 



000 


UDiqUIIln CarOUAyl'lclinilldi eolelaoG Ll 


UCHLI 


0,17 


0.32 


0.85 


0.000 


0.005 


0.153 




















855 


retinol-binding protein 1, cellular 


RBP1 


0.42 


0.41 


0.77 


0.006 


0.014 


0.961 


856 


cellular retinoic acid-binding protein 2 


CRABP2 


0.25 


0.38 


0.63 


0.000 


0.017 


0.037 


902 


enhancer of rudimentary (Drosophila) 


ERH 


0.38 


0.35 


0.76 


0.000 


0.000 


0.455 




homolog 
















910 


S100 calcium-binding protein A8 


S100A8 


1.46 


1.43 


0.35 


0.040 


0.001 


0.950 




(calgarnulin A) 
















934 


keratin 17 


KRT17 


0.16 


0.30 


0.15 


0.768 


0.073 


0.013 


693 


allbumin 


ALB 


2.63 


1.98 


0.92 


0.000 


0.008 


0.138 


737 


superoxide dismutase 2, mitochondrial 


S0D2 


1.17 


1.22 


0.54 


0.013 


0.001 


0.836 


789 


collagen, type XV, alpha 1 


C0L15A1 


0.57 


0.50 


0.26 


0.031 


0.186 


0.658 


906 


S100 calcium-binding protein A11 


S100A11 


2.95 


2.62 


0.53 


0.000 


0.000 


0.506 




(calgizzarin) 
















924 


lymphocyte cytosolic protein 1 (L-plastin) 


LCP1 


0.18 


0.13 


0.05 


0.000 


0.004 


O.034 



larger amounts of several protein spots detected on these 
gels that did not occur in similar gels made from cell lines 
and were thought to be cleavage products from proteins 
present in cells or plasma surrounding the tumor cells 
(e.g. cleaved albumin). The number of protein spots that 
differed between lung adenocarcinomas and squamous 
tumors were fewer than the number of proteins that dis- 
tinguished between small cell lung cancer and the other 
two lung cancer types. EN02 was smallest in the adeno- 
carcinoma group, while ANXA5 and CKB were lowest and 
KRT17 and SFN highest in the squamous carcinoma sam- 
ples. Several interesting spots found in the study remain 
to be definitively identified. 

5.2 Correlations between RNA and protein 
expression 

The availability of mRNA expression data from micro- 
arrays or Affymetrix chips for the same samples for which 
we have protein 2-D gel data permits several additional 
types of questions to be asked. We have thus far enter- 
tained only simple models of protein/mRNA relationships 
that ask which mRNA levels are most correlated with pro- 
tein spot sizes. Figure 7 depicts such a correlation matrix 
using colors rather than numerical data, since this makes 
it easier to visualize the relationships. In cases for which 
the identity of the protein spot is known such investiga- 
tions can answer the question of how well mRNA levels 
for a protein predict that protein's abundance. In cases 
of protein spots that have not yet been identified, or iden- 
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Figure 6. First two principle components for 52 protein 
spots distinguishing between lung tumor types. Small 
cell lung cancer samples are shown as squares, adeno- 
carcinomas as circles and squamous lung tumors as 
triangles. 



tion such as proliferating cell nuclear antigen (PCNA) and 
oncoprotein 18 (Op18) [12-15], particularly the once- 
phosphorylated form of Op18, as well as protein products 
of the UCHL1 , RBP1 , CRABP2, KRT1 5, and TUBB genes 
among others. Squamous cell and adenocarcinoma sam- 
ples had greater amounts of the S100 proteins S100A8, 
S1 00A9, and S1 00A1 1 , as well as larger average amounts 
of both the unphosphorylated and phosphorylated 27 kD 
heat shock protein (HSPB1). These two groups also had 
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Figure 7. Correlation matrix of 30 protein spots (columns) 
with mRNA levels as measured by 200 probe-sets on 
Affymetrix HuFL chips. The correlation coefficients are 
depicted with colors, bright red being near-perfect corre- 
lation (r = 1) and bright green anticorrelation (r = -1). The 
figure was made using the TreeView software (rana.lbl. 
gov/EisenSoftware.htm). 

tified without high confidence, such correlations can lead 
to or confirm hypothetical spot identifications. More gen- 
erally one can search for larger groups of proteins and 
mRNA whose abundances are controlled by some com- 
mon mechanism. 



5.3 Identification of novel lung cancer markers 

We have utilized a proteomic approach to identify pro- 
teins that commonly induce an antibody response in lung 
cancer. Such identified proteins or their corresponding 
autoantibodies likely have substantial utility for cancer 
diagnosis. There is also evidence that autoantibodies 
may be present prior to clinical diagnosis and therefore 
detection of autoantibodies or of circulating antigens 
may have utility for screening and early diagnosis of can- 
cer. We have identified a battery of proteins that induce 
autoantibodies that are specific for different types of can- 
cer. We have identified a pane! of autoantibodies that are 
detectable in serum of lung cancer patients at the time 
of diagnosis. The availability of a database of protein 
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expression in lung cancer has facilitated the identification 
of proteins that induce autoantibodies in addition to 
providing valuable information regarding the expression 
pattern of such antigens in different tumor types and cell 
lines. One such antigen we have identified in lung cancer 
is protein PGP 9.5 (Fig. 8) (Brichory era/, manuscript sub- 
mitted) [1 6]. PGP 9.5 was identified as a protein in lung 
cancer that induces autoantibodies as part of a study in 
which sera from 64 newly diagnosed patients with lung 
cancer, from 99 patients with other types of cancer and 
from 71 noncancer controls were analyzed for antibody- 
based reactivity against lung adenocarcinoma proteins 
resolved by 2-D PAGE. Gels containing separated pro- 
teins were blotted and subsequently hybridized with indi- 
vidual sera from patients or controls. Unlike controls, auto- 
antibodies against a protein identified by MS as protein 
gene product 9.5 (PGP 9.5) were detected in sera in 9 out 
of 64 patients with lung cancer. 

Circulating PGP 9.5 antigen was detected in sera from two 
additional patients with lung cancer, without detectable 
PGP 9.5 autoantibodies. PGP 9.5 is a neurospecific poly- 
peptide previously proposed as a marker for nonsmall cell 
lung cancer, based on its expression in tumor tissue. Using 
A549 lung adenocarcinoma cell line, we have demonstrated 
that PGP 9.5 was present at the cell surface, as well as 
secreted. Thus, the findings of PGP 9.5 antigen and/or anti- 
bodies in serum of patients with lung cancer suggest that 
PGP 9.5 may have utility in lung cancer screening and diag- 
nosis, as part of a panel of such proteins or their corres- 
ponding antibodies, which we have identified. 



6 Web pages 

The relational database for storage of sample, image, 
protein information and other related data is being con- 
structed in a stepwise fashion. The construction of a 
comprehensive database to collect all pertinent informa- 
tion is rather challenging and necessitates substantial 
resources. Similar effort in this area includes WebGel that 
is a web based gel database analysis system that con- 
tains previously quantified gel data generated from a 
stand-alone quantitative gel analysis system [1 6]. Public 
WebGel demonstration databases currently available 
can be found in the web site (http7/www-lecb.ncifcrf. 
gov/webgel WebGel database). The task of web based 
retrieval of data from the protein database is rather com- 
plex as there are different kinds of data that may need 
to be retrieved. The microarray data could be stored in 
the database instead of Excel files, and the Access 2000 
database that the MS team utilizes could be transferred to 
the database. Tables are being built to eliminate any 
handwritten collection of data. Developing a database is 
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Figure 8. 2-D PAGE and Wes- 
tern blot analysis of A549 lung 
adenocarcinoma cell proteins. 
Panel 1 shows A549 2-D protein 
pattern after silver staining. The 
boxed area is shown in panel 2, 
in which arrows point to the 
location of PGP 9.5 forms 
(spots P1 to P3) recognized by 
sera from patients with lung 
cancer and the position of the 
form P4 recognized by a poly- 
clonal rabbit anti-PGP 9.5 anti- 
serum, which also recognizes 
P1-P3. Panel 2 shows close- 
ups of western blots hybridized 
with two different sera from 
patients with lung adenocar- 
cinoma that showed reactivity 
against PGP 9.5 proteins. 



hard because of complex and very large amount of 
unstructured data generated. There are conflicting 
pressures between "using what we've got already" 
and constructing something better. Sometimes there is 
a natural break in the data, such as when a shift is 
made from one platform type to another. Then one 
could "pile up" old data and organize it neatly. On the 
other hand, when new technologies are introduced, 
they require new ways of storing the data. The lung 
protein database is continuously evolving to enhance 
the relational schema to be more flexible and compre- 
hensive and to make data processing more robust and 
automatic. 

The lung protein database is a backbone to record pro- 
teome data for many different studies and to mine the 
existing data for new discoveries. The new generation 
LIPS provides investigators web-enabled interfaces to 
the laboratory databases and 2-D images with internet 
access. There is certainly a need for sharing information 
in the database on a global basis. We have used internet 
and WWW technologies to provide a distributed process 
with easy-to-use front-end user interface. Figure 9 shows 
a top level view of a web-based process for performing 
our studies from a data processing perspective. Some of 
our web pages were developed in Visual InterDev and 
ASP development environment on Microsoft and some 
were developed in Oracle 8i and WebDB web application 
environment on Solaris. As an example, the MS data web 
page is shown in Fig. 10. Detailed "how-to M document- 
ation is provided as on-line help for recently extended 
capabilities of LIPS. 




Figure 9. Web- based process of using lung protein data- 
base. 



7 Conclusion 

The value of the database we have constructed depends 
to a large measure on its content, the quality of data and 
the ease with which data can be retrieved and analyzed. 
While the amount of data generated is already quite size- 
able, it is likely that the database will continue to undergo 
substantial expansion. Proteins are being identified at a 
rapid pace, thus enhancing our ability to link protein 
expression data with RNA based expression data for cor- 
responding genes. As such, the database will, play an 
important role in achieving our objective of developing 
novel classification schemes for lung cancer and the 
identification of novel markers for early diagnosis. The 
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database will also serve as a useful resource for other 
investigations of lung biology and of diseases other than 
lung cancer. 

Received May 20, 2001 
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Translational regulation of human P 53 gene 
expression 



Loning Fu, Mark D.Minden and 
Sam Benchimol 

tv Ontario Cancer Institute/Princess Margaret Hospital and 
JJ^oT^ical Biopsies, University of T*»* 
K£e»ity Avenue Toronto. Ontario. Canada M5G 2M9 

In blast cells obtained from patients with acute myelo- 
genous leukemia, p53 mRNA was present «n aU ti* 
Lnoles examined while the expression of P 53 protein 
variable from patient to patient MuUtions in the 
psl gene are infrequent in this disease and, hence^ 
triable protein expression in the majority of the 
S££ Slot bTaccounted for by mutation h 
this study, we examined the regulation of P 53 gene 
ex^rSon in human leukemic blasts and characterized 
£TS ^scripts in these cells. We found control 
iwfth at thelevel of RNA abundance and at fte 
£5 of translation. Four experiments point towards 
Sdatio^control of human P 53 gene expre^o^ 
there is no correlation between the level ol _p53 
* the level of P53 protein expression in blast 
SE^Sifto^ cell lines with similar levds of 
„53 prSexpression but with different evels of p53 
SnH fed that there is preferential 
o^3 rn^A with Urge polysomes in tiie ceUs^th 
less p53 RNA. Third, translation o synthetic human 
nil transcripts in cell-free extracts is inhibited by the 
SS?Sm Fourth,.4he p53 3'UTR, when present 
E?dJ SepL translation of a heterologous tran- 
These observations raise the possibility that 

bv RNAbinding factors acting on ^ P53 3 UTR. 
J^onlr. acute myelogenous leukemia/pSSAransUtxonal 

control 




The scarcity of p53 gene mutations in AML is : ,pot^ 
unique to this disease. For example, p53 gene mutatious.31 
are rare in neuroblastoma, testicular tumors and.HP^£* : 
positive cervical cancer. While the p53 gene is most' 1 
commonly inactivated through mutation in human tumore^ 
P 53 protein function can also be disrupted through bpt&gjj 
genetic mechanisms including protein-protein kteractKjtjjj % 
(Scheffner et al, 1990; Momand et al, 1992; (Mneritfa^ 
1992- Ueda et al, 1995). protein conformational.change • 
(Milner 1991; Ullrich et al, 1992) and nuclear exclusion 
(Moll et al, 1992, 1995). Indeed, two groups have 
suggested that inactivation of wild-type p53 protein in ^ 
AML occurs through a mechanism involving confotn#s|i 
ational change of the protein (Zhu et aL, 1993; Zhing ^ 

etal., 1992). ... j-:" ' 

The level of p53 protein expression in pmnary blast . 
cells obtained from AML patients varies from patient to 
patient. In previous studies from this W>«*W P»" 
protein expression was detected in only 45% (34 of 
75) blast samples examined by metabolic labelling with 
f3 5 Slmethionine and immunoprecipitation (Smith etal, 
1986- Benchimol et al, 1989; Slingerland et al, 1991). 
Zhang et al (1992) detected P 53 protein expression in. 
blast samples from 75% (37 of 49) AML patients Several 
reasons may explain the absence or very lowlevel of P 53 
protein expression in certain blast samples Ttese include ^ 
■ low levels of P 53 mRNA, inhibition of p53 mR^^ 
translation and extremely rapid turnover of newly W^gg*- 
sized P 53 protein. In this study, we have exammedfc^ 
regulation of P 53 gene expression m human AML 
Sid find control bo* at the level of RNA abuiKlance and,7| 
at the level of translation. Translation^ region u ^ 
supported by experiments in which we demonstrate that, -3 
iS£y u^transkted region (3'UTR) can repress transh- ^ 
tion of P 53 RNA and of heterologous transcripts in cell-. ^ 
free extracts. ' 



Introduction 

Human acute myelogenous leukemia (AML) is a clonal 
oS Ssing in a very early hematopdeUc po*^ 
cell following multiple carcinogenic events (Wigg^ 
Tal 978- Fialkow etal, 1987). Mutation of the p53 
« su reLor gene occurs infrequently m *e blast 

s" 5 Id^Jl^&Vii^i. IS* 

Slingerland et aU al 1994; Wattel et al. 

^^P. et - °f l/^^5y^p5^muta^ns have been detected 

i^JS-nSZ ^ the developrnent ofAML. 

completely wild-type P 53 protein funcuon. 
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Results 

Exaression of p53 protein in human AML 

blast cXLm AML pati^tsa^th^hu^ 

acute leukemia cell lines OCI-M2, OCI/AML-3 and 
AML-4 were characterized for P 53 protein exprcssioaby 
metabolic labelling and immunoprecipitation. ^™ * 
a human erythroleukemia ceU line (Papayannopoulou 
et ol.. 1988) previously shown to contain a missense 
mutation in the P 53 coding region at codon 274 ana ro 
ha^oTtAehonSlogous wfld-typep53 allele (Shngeri^ 
Tal 1991). OCUAML-3 and OCI/AML-4 ceUhnes were 
derived ^rom the primary blasts of two AML pahen* 
Wane et al, 1989). The full-length P 53 transcripts^ 
£ ceUs were amplified by RT-PCR, and the ptoducte 
STiquenceTwe found that the P 53 ( 
both ceU lines were wild-type throughout their coding 
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ir.„ ■> Northern blot analysis of p53 mRNA in human AML cells. 

frJcell blast sables 

fig 01 •« rtssc e d containing 6% fonnaldchydc, 

n^cttNA. After autoradiography. Ac P^™^*™}^ 
fiutrTwcre hybridized with a probe specific for l^f»<^ 
ST„S aClance of P 53 mRNA was deto™^ by 
^sphorimage analysis after normalizing 10 *c value of IBS 
ribosomal RNA in each simple. 



_ , c ln(e «^wi of p53 protein in human leukemia cells. (A) Cell 

/in7 P nin were immunopreciptvawa w » ul " . 

( ?K^v^PAb4l^ or with monoclonal antibodies against p53 

antibody CPAMivj or wmi » • : n sxlO 6 cells by Western 

mrions u well »s through Iheir 5'- md 3'UTRs. The only 
E£>ee delected in the p53 uamcripB expressed .n 

^sl&T'SS rep— e e^les 

lines was determined by Western ow j 
PAM801. Densitometric scanning of the «<* "P^iJ 
Rgurc IB revealed that the amount of P 53 V**™™ 
A MT 3 and 0C1/AML-4 was similar and -10-fold lower 

£K ib. high 2* - 
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* 1 ^^^^^^^^^ 

5 ^^u^preciVitatioo- P53 P«*» ^^^Jf^ 
L^enrf^sWes p53 mRNA levels were determined by 
N<Xn ioS^- described in the legend ,0 Rgure Z 

and as a result mutant P 53 polypeptides accumulate 
intracellular^. v - x 

Exoression of p53 mRNA In human AML 
^deSSne whether the differences m P 53 protcm 
expSlion in leukemic blasts reflected dances tn the 
aSmdSce of P 53 mRNA, RNA was isolated fromAML 
SHTples'and cell lines and subjerted I lo JNorthem 
blot analysis. The relative abundance of P 53 mRNA in 
ceUs^af estimated by phosphorimage analysis after 
normalSng to the value of 18S ribosomal RNA in each 
^SSfe results are shown in Figure 2 and ^ 
!hZthe 16 AML blast samples examined synthesized a 

sjt 2«S --sSfe 

between P 53 protein expression (on the basis of : the 15 
^meuLlic labelling assay) and the level of p53 mRNA 

'^AMS^fS^AMl^ cells contained sirailar 
JZS£«& protein. However, me RNA Wot sb*wn 
^Figure 2 bdicated that the abundance of p5 ImRNA 
was SbW higher in OO/AML-3 than m OOMJA A 
Ho 8-foW difference in P 53 RNA was seen m repeated 
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• • ^««mRNA with polysomes in OO/AML-3 (A) 
Re. 4. Association H p53 mKMA wiui Y™J mRNX w :tj, 
^ ArTTAvn -4 (B) cells. The association of p53 mKJNA wun 
"? and OO/AML-4 ccUs was compared. Cell 

18S ribosomal RNA or OAH» w ludc ^ 

loading of RNA sampl" ™ g^^S variable 

pro^n eSSi~ * «* related to the amount of P 53 

mRNA in these cells. 

Association of pSS m^ 
To test whether p53 ge^prysK)n . mRNA with 

control in ^'^^^ OCIMMl^ cells was 
polysomes in ^VAML-^ is m orc translationally 
""^ SS^li StiOCL/AMW as the above 



present in 0O/AML-. -^^f^*^, 
Polysomes compared with OO/AML-3. Cells.*, 
lected and lysed in the presence of (^dohexim™ 
MeCl 2 which stabilize the association of nbcsoincslw 
mRNA The lysates were sedimentcd through a li ~ 
sucrose gradient and fractions were collected. RNA. 
extracted from each fraction and analyzed for the^ presence 
of p53 mRNA by dot-blot hybridization with a ^labdka. 
P 53 cDNA probe. The gradients were calibrated win* 
polysomes prepared from lysates by precipitation wiuK, 
100 mM MgCl 2 Polysomes were found at the bottom of,;< 
the eradient in fractions 5-10. while monosomcs were 
found in fractions 1-4- P 53 mRNA from OO/AML-4 
cells was associated with V 
mRNA from OCI/AML-3 cells (Figure 4). In OCI/AML-4^: 
cells 39% of the P 53 mRNA was found in fractions 7- . 
10 containing high molecular weight polysomes wiulc in ? 
oSaMlS cells 21% of the P 53 mRNA was found in 
UKse same fractions. As an internal control, the distobution 
nrotcin L35 RNA was compared and shown 
tJtirndrr^AML-3 and Sa^4 (data : 

not shown). 

Analysis of the 5* end ofp53 mRNA 

Tte human P 53 gene has been shown to have a cluster of 
se^n major transcription initiation sites .and several 
Snortites lying further upstream (Tuck and CrawforJ 
S5 T^iscripts initiating from the minor sites would 
have a longer 5'UTR with potential to form a stable stem- 
top struct* close to the 5' cap. Such structur^swould 

?JrSsS. Recently, mousef53 protein was shown to 
. bind to the 5'UTR and to inhibit Ration of its own 
mRNA in an in vitro assay system (Mosner ^ 
Stable stem-loop structures in the 5'UTR regions ol*« 
n^TmRNA transcripts have ^ ^ownjo^ 
translation initiation by interfering with ^ "ftm*** 
3*on initiation factors or by serving as ^ ^ 
nrhtfins that inhibit translation (Feng ana 

sasws?-£ »«*» -* Ha,Be - 

in different blast samples md J^^ nu ^^d- U 

OCI/AMl^ cell lines and from ^^^Snts: 
was examined. After ^^.^J^^^^^ 
were resolved by ffHJK** 
acrylamide gel. As shown in Figure 5B .the preo ^ 

protected fragment ,n all the s^mplj 
nucleoddesinlengthindicaungacornmonsiteforuu j 
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F«.S. RNase protection assay. (A) The map of the p729 pUsmid The 
p 729 pUsmid was constructed as described under Materials and 
lodbods. After linearization with A/wdUl a 729 ouckoiidc antisense 
WK probe was generated by transcription with SP6 RNA polymerase 
w*£og protected p53 fragments of -385 nucleotides due to p53 
Mscxipts initiating from one of the major start sites (a) and 449 
todbotides due to p53 transcripts initiating from the most 5[ of the 
ci«r transcription start sites (b). (B) The 729 nucleotide [^PJUTP- 
tafefied antisense RNA probe was annealed to 30 ng of total RNA 
cawed from OO/AML-3 and OQ/AML-4 ceil lines and seven 
AMLbUst samples before digestion with RNasc A and RNase Tl. 
Tae protected fragments were separated by electrophoresis on a 6% 
-dhaovlamide-S M urea gel and visualized by autoradiography. The 
foltas and size (nuclectfde length) of 5' end-labelled fragments of 
Indigested p6R322 plasrnid DNA are indicated on the left. The 
i arrow indicates the position of the major protected fragment 
t top arrow indicates the undigested probe. 



of j53 gene transcription in leukemic blasts at the major 
start site. These data indicate that, in contrast with murine 
jrB mRNA, stable secondary structures are unlikely to 
east at the 5' end of human p53 mRNA. 

Analysis of the ? end of p53 mRNA 
Onan p53 mRNA contains a long 3'UTR of 1176 
nafaxides with an Alu-like repetitive sequence element 
of -470 bp located immediately upstream of the poly(A) 
tal(MadaShewski et a/., 1984). The Alu-Uke sequence is 
in the reverse transcriptional orientation with respect to 
tte p53 gene. Furthermore, the Alu-like sequence is 
nsang in murine p53 transcripts and it interrupts a region 
mtanan p53 mRNA which shows homology to mouse 
pBmRNA When analyzed with the FOLD program of 
GOG. the Alu-like element in the 3'UTR of human p53 
sfiHA is predicted to form an independent secondary 
stacture that does not have long-range interactions with 
cfer regions of p53 mRNA. In the presence of a poiy(A) 
taitK secondary structure formed by the Alu-like element 
ispedicted to remain essentially intact except that a 50 
raAodde U-rich sequence at the 5' boundary of the Alu- 
Hc sequence will interact with the poly(A) tail. The 
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Tig. 6. In vitro translation of synthetic p53 RNA containing variable 
portions of the 3'UTR- (A) Hasmid template used to synthesize p53 
RNA in vitro. The p2516 plasrnid was constructed by inserting the 
entire 15 kb wild-type p53 cDNA sequence downstream of the 
bacteriophage SP6 promoter in a pSP64-derived plasrnid. Transcription 
from the SP6 promoter present in p25l6 leads to the production of 
transcripts in which the first 10 nucleotides are derived from pUsmid 
sequences while the remaining nucleotides are derived from the p53 
gene beginning at the +7 position of native p53 transcripts initiating 
from one of the major transcription start sites (Tuck and Crawford, 
1989). Linearization of p2516 at the EccRX site before in vitro 
transcription generates a full-length, Alu^ootainiag p53 transcript 
(T2516); linearization at the BomHI site provides a template for the 
synthesis of a truncated p53 transcript missing a portion of the 3'UTR 
containing the Alu sequence (T2034). Both transcripts were 
pc4yadenylated in view to generate p25l6An and p2034An. The open 
rectangles shown to the transcripts represent the position of the p53 
coding region. (B) 50 ng of the w W/ro-synthesized T2034, T2516, 
T2034An and T2516An p53 RNAs were translated in a rabbit 
reticulocyte lysate at 3CTC for 30 min in the presence of 
[^methionine followed by irnhiurK)predpitation, SDS-PAGE and 
autoradiography. In the 3X TO 16 and 3X T7516An lanes, 1 50 ng of 
T2516 or T25l6An RNA was added to the in vim? translation 
reaction. The right panel presents the results of a Northern Wot in 
which 50 ng of synthetic p53 RNA was applied to an agarose- 
formaldehyde gel. Wotted and hybridized to 32 P-labeUed human p53 
cDNA. 



extended base pairing between U and A residues will 
further stabilize the secondary structure formed by the 
Alu-like element. To determine whether or not the Alu- 
like repeat present in human p53 mRNA might constitute 
a negative regulatory element during translation, a series 
of in vitro uanscription-translation experiments was per- 
formed. 1 . 

An SP6-derived plasrnid containing human wild-type 
p53 cDNA including the entire 3'UTR was constructed 
(p2516 in Figure 6A). p2516 was linearized with EcoRL 
or with BamHl and used as a template for in vitro 
transcription. In some reactions, a poly(A) tall of 200- 
300 adenylic acid residues was added to synthetic p53 
RNA using poly(A) polymerase. In this way, four synthetic 
p53 transcripts were generated: T2516An and T2516 
represent full-length, Alu-containing. transcripts with or 
without a poly(A) tail; T2034An and T2034 represent 

4395 



shorter Alu-dcficicnt p53 transcripts with or without a 
oolvfA) tail. These transcripts were then used as templates 
forttinslation in a rabbit reticulocyte lysate containing 
pS]methionine. P 53 protein synthesized in w(m was 
Junopredpiuted with- PAW21 monoclonal anUbody 
and visualized by autoradiography (Figure 6B). The 
amount and integrity of the synthetic P 53 RNAs added* 
the in vitrv translation reactions was monitored by agarose 
E el electrophoresis and Northern blotting as shown m Je 
right panel of Figure 6B. Densitometric tracing of the 
data indicated that the Alu-containing. n™^* 6 ™? 1 ™ 
transcript T25 1 6 was translated ~3-fold less effioentiy Uian 
the Alu^ieficient, non-polyadenylated transcript T2034. In 
addition, the polyadenylated. Alu-contaimng transcript 
T2516An was translated ~20-fold less efficentiy than the 
polyadenylated, Alu-deficient transcript T2034 An These 
data indicate that the Alu-like element present in the P 53 
3'UTR can inhibit P 53 mRNA translation in vitro, even 
in the absence of a poly(A) tail. The predicted interaction 
of the poly(A) tail with the Alu-like element appears to 
increase further the inhibition of translation. 

To test further the inhibitory activity of the p53 3 U IK, 
we examined the ability of the P 53 3'UTR to control the 
translation of a heterologous RNA. The Alu^ontaiiung 
P 53 DNA fragment extending from nucleotides 2034 
io 2516 wasexcised from plasmid p2516 and inserted 
downstream of a heterologous gene (CAT genjinan 
SP6-based plasmid vector to generate the plasmid pCAi- 
Alu (Figure 7A). In vitrv transcription and trarislaUon 
reveaicdXt non-polyadenylated CAT-Alu RNA was 
translated 5-fold less efficiently than non-po yadenylated 
CAT transcripts lacking the Alu sequence (Figure 7B). 
Xn a different region of the P 53 3'UTR (nuclides 
1465-2034 in plasmid P 2516) with approximately the 
same length as the Alu-containing fragment was inserted 
stream of the CAT gene, no effect qn CAT translation 
was observed (CAT-BS in Figure 7B). The ability of the 
Alu-containing segment of the P 53 3'UTR to act on a 
heterologous transcript indicates that it likely represses 
translation independently of upstream sequences. 

The inhibitory activity of the Alu-like element on P 53 
translation was likely the result of its action in cis and 
not simply due to non-specific inhibition of translation, 
since a3-fold increase in the amount of Alu^ontaining 
transcript added to the reticulocyte lysate resulted in a 
corresponding increase in the amount of P 5 1 protein 
synthesized (Figure 6B). Furthermore, when 200 ng of 
luciferase RNA was added to a reticulocyte lysate together 
with 200 ng of CAT-Alu or CAT-BS RNA, there was little 
difference in the amount of luciferase synthesized (Hgure 
7C). Similarly, when 200 ng of luciferase RNA was added 
to a reticulocyte lysate, either alone or mixed with 200 ng 
of T2034 or T25l6An RNA. there was little difference m 
the amount of luciferase synthesized (data not shown). 

To confirm that the decrease in P 53 protein syndesis 
from Alu<ontaining P 53 RNAs was due to tr^slational 
regulation and not due to F«;^^ 
in the reticulocyte lysate. adenylated T2034 and T2516 
s J£t£ transcripts were added to the rabb.t reUculocyte 
v^ate under the same conditions as those used for m 
Slation. After incubation for 15 or 60 mm, RNA was 
££acted from the lysate and the amount^ of synthetic p53 
RNA present in the lysate determined by Northern blot 




Fk 7. The p53 Alu-like element can inhibit translation of a 
heterologous CAT transcript. (A) Plasmids used* geo«te CAT 
uanscripts in vitro. (B) 200 ng of in v.r/o-$ynthcsued CAT. CAT-Alu, 
and CAT-BS transcripts were translated in a rabbit reticulocyte lysate 
at 30*C for 30 min in the presence of l"S]methfc>oine. The reactions 
were stopped by adding an equal volume of the 2X ^ 
buffer heated to MXTC for 5 min and analyzed by SDS-PAGE and 
autoradiography. An ethidium bromide-stained agarose gel 
demonstrating the integrity and amount of synthetic transcripts that 
' were added to the in vitro translation reaction ts shown below. ^ . 
<Q 200 ng of luciferase RNA was translated in » rabbit rf°**y* • - 
iysatteiuir alone or in the presence of 200 ng of CAT-BS or 203 »g ^ 
of CAT-Alu. Reaction mixtures were incubated in the J*^**" s* ^ 
["Slmeduooine at 30*C for 30 min and processed as in (B). The ^ • 
)„ vf/m-synthesked luciferase protein is shown* 

RNA used for in vilro translation is shown in the ethidium bromide ^ 
stained-ag arose gel in the bottom panel >^ 



analysis. Enhanced degradation of the Alu-cooUuning 
transcript was not observed (Figure 8). We conclude that 
a segment of the P 53 3'UTR encompassing the Alu-like 
element is capable of repressing translation invurv. 

Discussion 

The observation that wild-type P 53 protein expression in 
leukemic blast cells does not correlate with the ^level oi 
P 53 mRNA mirrors findings reported previously for busts 
and other human cell types (Matlashewski et al, . 
Kastan et al. 1991a; Slingeriand et al, l^Sasano 
et al, 1992; Hsu et al, 1993). The absence * *tectaHe 
P 53 protein in cells expressing abundant levels of wdd- 
typc P53 mRNA has usually been attributed to the short 
half-life of P 53 protein in normal cells (R?g d 
1985). A similar situation exists in papillomavirus yvr vr 
infected cells such as HeLa cells where P 53 P^^ 
detected even though these ceUs produce P 53 
this RNA is associated with polysomes (MatlasbewsB 
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Fig. 8. Stability of synthetic human P 53 RN As, .n rabbit rccculocytt 
Stes. 100 ng of adenylated T20M (A) and T2516 (B) synthe.^ 
W?Awas added to the rabbit reticulocyte lysate and incubated at 30 C 
for 15 or 60 min under the same conditions used for n «» 
Nation. RNA present in the lysates was *en extracted arrftoaded 
™\% tgaroseTormaldehyde geL The 0 min ume point represent 
^ ne oflynthetic RNA loaded diiectly oo the gel. The amount of 
JS RNA 2S sample was then determined by Northern bkyung 
using a 3I P-labelled human P 5J cDNA. The lower panel shows^e 

18S ribosomal RNM recovered from the rabbit reoculocyte 
lysates detected by ethidium bromide staining of the gel. 

et al 1986). The enhanced degradation of newly synthe- 
sized^ protein in HeU cells was shown to be promoted 
by the papillomavirus E6 protein which »s exposed 
constitutively in these cells (Scheffner et al, 1990). 

In this re^rt, we present data showing ^ftoence, 
in p53 mRNA abundance exist in AML blasts and 1 that 
these differences cannot explain the heterogeneity in the 
level of P 53 protein expression 
Using a metabolic labelling assay in which bl««tan 
different AML patients were pulse-labelled with _[ 3J SJ- 
methionine for 15 min to minimize the contribution of 
£S hSf-life on tfie detection of P 53 protein synthesis, 
wTfound differences in the level of p53 P rotem expression 
I blast samples. These observations raised the poli ty 
Sat S^ene expression may be regulated at the transla- 
SlevelmcerUinhuman cells. We tested this possibility 
by anaTySng the distribution of P 53 mRNA on polysomes 
£ v? VO S by examining P 53 RNA transition „ vUro^ 
We have used two AML cell lines, OCUAML-3 and 
OCI/AML^t that contain similar amounts of wild-type 
p53 protein even though OCIMML-3 contains 4- to Wold 
more?53 mRNA. Comparison of the polysome profile of 
Sese cells indicated that a greater proportion of (he p53 
mRNA was associated with larger polysomes ; in OCV 
JXiw than in OO/AML-3. P 53 mRNA » both of tiiese 
cell lines as well as in blasts from different AML patients 
is oresent as a single, full-length species of -2.8 kb 
USrates from Aoounoo transcription start : joe and 
contains similar sequence and structural elements. 

SSon JStsladon experiments In vitro indicated 
thalS? 3'UTR contains a negative regulatory domam 
ft* Sable of repressing translation in vitro. A region 
of Z 3'UTR consisting of -500 nucleotides and con- 
SnhTg 'an Alu-like element is capable of repressing 
Stio^ of P 53 mRNA and of a heterologous transcript 
Thep53 3'UTR, when present in cis, repressed ^slaUon 
if oolvadenylated as well as non-polyadcnylated tran- 



possibly through its secondary structure, is capable of 
repressing p53 mRNA translation. In addition, interaction 
of the Alu-like element with the poly(A) tail may repress 
the latter's function in translation. Experiments are in 
progress to map precisely this regulatory element in the 
p53 3'UTR and to determine if the p53 3'UTR plays a 
similar role in regulating translation In vivo. 

Our finding that p53 protein expression in AML blasts 
is controlled, al least in part, through mechanisms acting 
at the translational level, raises the possibility that transla- 
tional regulation may provide an epigenetic mechanism 
to reduce or even eliminate wild-type P 53 protein function 
in leukemic blasts. In preliminary experiments to address 
this point, we have exposed blast cells that express little 
or no detectable P 53 protein to 6 Gy of ionizing-radiauon 
and have observed increased steady-state levels of p53 
protein at 1.5 h after irradiation (data not shown). Geno- 
toxic agents have been shown previously to increase the 
level and/or activity of p53 protein through a post- 
transcriptional mechanism that is not well understood 
(Kastan et al, 1991b; Fritsche et al, 1993; Lu and Lane, 
1993; Zhan et al, 1993). Hence, blast cells retain the 
ability to up-regulate p53 expression in response to geno- 
toxic stress. At least under these conditions, p53 function 
may not be lost. This type of analysis, however, does not 
address the function of P 53 in proliferating cells that have 
not been exposed to genotoxic stress. In this regard, 
previous studies from our laboratory demonstrated a highly 
significant correlation between p53 protein expression in 
leukemic blast cells and the secondary plating efficiency 
of these cells (Smith et al, 1986). The latter provides an 
estimate of the self-renewal capacity of progenitor ceUs 
in the blast population. Deregulated p53 expression might, 
therefore, be expected to affect the self-renewal capacity 
of blasts in the absence of genotoxic stress. 

Accumulating evidence demonstrates the ^vdvement 
of the 3'UTR in translational control (Jackson, 1993). The 
demonstration that the 3'UTR of certain transcripts can 
control mRNA localization and polyadenylation provides . 
a mechanism for translational regulation (Huarte et aL, 
1992- Gavis and Lehmann. 1994). In addition, specific 
sequences within 3'UTRs have been shown to repress 
temslation (Goodwin et al, 1993; Evans et al., 1994; 
Kwon and Hecht, 1993). RNA-protein interactions are 
likely to be involved in 3'UTR-dependent translational 
repression. Indeed, a protein that binds specifically to the 
3'UTR of protamine 2 mRNA and represses its translation 
has been identified (Kwon and Hecht, 1993). If the p53 
3'UTR can be shown to regulate p53 mRNA translation 
in vivo, it is possible that /raw-acting factors (missing or 
inactive in reticulocyte lysates) activate components of the 
translational machinery to bypass this negative regulatory 
domain on human P 53 mRNA. Such fram-acting factors 
could interact directly with P 53 mRNA to enhance ; itsrate 
of translation. Alternatively, fra/u-acting factors averted 
to the P 53 3'UTR (that are also present in reticulocyte 
lysates) may act as repressors of translation. DuT"^ 
in the level of P 53 protein synthesis among AML bksts 
and possibly other human cells could, therefore, be deter 
mined by differences in the level or activity of these 
regulatory molecules. 
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Materials and methods 

"SSn.iML.j , n <J OCI/AML-4 cell lines were derived from primary 
E£3 W AML JSnu (Wang « „/.. 1989). The OCI-M2 ce I l.ne 
a i^d from .he primary blasts of a patient whose erythroderma 

ceUsweregrown in alpha-modified minimum essenual ^'^««MEM 
r„nLinto« IM fetal calf serum (FCS) (GIBCO). The OCI/AML-4 ceUs 
were grown in a-MEM containing .0* FCS and .0* opened 
medium obtained from the human bladder carc.nonu cell l.ne 5637 
(56V-CM) (Wang « «/.. 1989). The AML blast cells were obtamed 
directly from AML patients. The mononuclear cell fract.on of penphera 
K was collected after separation through ^^^ ^^ 
(I 077 g/ml) and T-lymphocyte depletion (M.nden « «... 1979). The* 
cells were stored frozen in liquid nitrogen before use. 

Metabolic labelling and immunopreciprtaticn ■ 
The blast cells of AML patients were ^J^. 
M 37-C in a-MEM containing 10* FCS and »»* *63 7 -CM beta* 
metabolic labelling. IXI0» cells were bMW ••J.Sj^ 
methionine ( DuPont NEN Research Products) in 0J ml a-MEM lacking 
and containing .0* dialysed FCS at 37*C for .5 m-r, CeUs 
wtre then immediately pelleted, ihe radioactive med.um removed, and 
the cells Ivsed on ke in a solut.on containing 25 mM Tns pH 7.4 
50 mM Ml 0.5* .odium deoxycholate. 2* NP40. 0.2* SDS. 0.5 mM 
PtenvTi sulfonvl fluoride (PMSF). I pg/ml leupept.n and I pgM 
Sntafi 20 n\in. Lysates were cleared by cenmfugatwn. the 
u^Tan, was retained and incubated with J« ° f " 
IoG2a mouse monoclonal antibody (Sigma) for 60 mm on ^ These 
were then reacted with OS ml of a 10* suspension of fornwUn-tfeated 
%5S£»* Cowan 1 cells (Pansorbin. Calb^hem-Behnng. 

5» min on ice. followed by cemrifugation and ™«™°< *« 
supernatant. Portions of precleared lysates contam.ng equal numbers of 
Emetic acid-insoluble counts CIO 7 c.p.m.) were diluted in NET/ 
SS. Wto 1150 mM NaCI. 5 mM EDTA P H 8.0. 50 mM Tns pH 7.4. 
0 05* NP40. 0.02* sodium azide. 0.25* gelatin) and immunopreap..- 
aied on ice for 2 h with PAM21 monoclonal ant.bod.es against P 53 
pr^ octroi PAW19 antibodies ^aHow « a /.J981,.T^.mmune 
complexes were collected on 60 u. P^^g^JtSS 
beads (Pharmacia), washed three limes with NET/GEL buffer, and elu eo 

S? xSTv^ «i* buffer (2 * SDS - l0 * ? ,y fT^°; 

bromophenol blue. 25 mM Tris pH 6.8. 0. 1 M dithiothre.tol. by boiling 
K^n The Sepharose beads were removed by centnfugat.on the 
amrfes were toadfd on a 10* polyacrylamWe gel conuin.ng SDS and 
^ZsZ were resolved by electrophoresis at 45 mA. Gels were fixed m 
£?* S aclf «d 25*^Xnol for 30 min before drymg and 
exposure to X-ray film (DuPont NEN Research Products). 

5^«.£wL^ 

buffer exuaos were passed through . 21-gauge "^sev e « l 
£?VS« Wscosity ^boiled for .0 min before d«g« 
a, 45 mA on a 10* r^yacrylamide ^. eo ^^^£ 
proteins were transferred to a nitrocellulose membrane <Sch'e*cher 
STsctoeM). and the abundance of P 53 protein was estimate^ Iby 
LnuMMooin- with a human P 53-specific monoclonal antibody 
SS^WlSi col.. 1986). Bound antibody was detected us.ng the 
Priced chemtminescence detection system (DuPon, NEN Research 
Products) according to the manufacturer's instructions. 

£2 h Xl^NA a wfisola,ed using the guanidinium thiocyanate- 
Sl hlS^O^«Chirgwin « «/.. 1979). 20 ug of total | RN A 
wa grated by electrophoresis on a I* agarose gel contanmj 6* 
ZLSLhvde and transferred to a nitrocellulose membrane (Schleicher 

&53ctP to. random priming reaction (Feinberg and *ge.ste.n 
1983) washed and exposed to X-ray film. The amount of RNA 
LT™V^ with a Molecular Dynamics Phosphorlmager us.ng 

Zlti^^^^ 

«f o« cDNA from the pR4-2 plasmid (Harlow « of.. 1985). the L35 

•^nNAiHerzoc et al 1990): the GAPDH probe was a 13 kb Pst\ 
%£Z of S oSlSf«DKA (F« H «/.. .985): the 18S ribosoma. 



RNA probe was the £co«.t fragment frorr^he human ribosomal 
gene (Torczynski it at. 1985). 

Genomic DNA preparation 

Genomic DNA from OCl/AML-3 and OCI/AML-4 cell lines was isolated 
following a modification of the procedure described by Kupicc n <//. 
(1987). 3X !0 7 cells were washed with ice-cold PBS buffer, rcsuspenovj 
in 3 ml of lysis buffer (20 mM EDTA pH 8.0. 100 ng/ml proteinase K. 
0.5% sarkosyl) and incubated at 50*C for 3 h. DNA was extracted 
with phenol/chloroform, dialysed against 50 mM Tris-HCl P H 8.0, 
10 mM EDTA. 10 mM Nad at 4 - C. and then treated with RNase A 
(100 ng/ml) at 37*C for 3 h. DNA was again extracted with phenol/ 
chloroform and dialysed against 10 mM Tris pH 7.4, I mM EDTA. 
DNA concentration was determined by measuring the absorbancc at 
260 nm. 

Amplification cfpS3 sequences from RNA and DNA 
20 ug of total RNA was precipitated with eihanol and resuspended in a 
30 ul reaction containing 300 ng of oligo(dT) primer (Arnersham 
intematiooal). 50 mM Tris-HCl pH 8.3. 77 mM KCI, 3 mM MgCk 
3 mM dilhiothreitol. 3 mM dNTR 30 units of RNAguard (Pharmaciat 
and 200 units of Moloney murine leukemia virus reverse iranscripu.se 
(GIBCO-BRL) and incubated at 42'C for 60 min. The first strand cDNA 
was lUtz used as the template for amplification by PCR using Tuq 
polymerase (Promega). PCR amplification was performed with lOjil of 
each first stand cDNA as the template and 40 cycles of denatunuioo 
(94*C. I min). annealing (64*C. 30 s). and elongation (72*C 1 min). 
The following p53-specific primers were used for amplifying the c omplctt 
codina region and the 3'UTR: 5'SXl (sense, exon I. GACACTTT- 
GCGTTCXKjGCTGGGAG). 5'SX5A (sense, exon 5. GAGCGCTGCT- 
CAGATAGCGATG). 3'SX1I (sense, exon II. GAAGGGCCTGACT- 
CAGACTGAC). 3'AX-6 (antisense, exon 6. AGATGCTGAGGAGCG- 
GCCAGAC). JS-3 (antisense. exon II. GAGGGAGAGATGGCGGT- 
GGGAGGCTGTC) and AS -4 (antisense. exon 11. GGCAGCAAAGT* 
TTTATTGTAAAATAAG). The 5'UTR and sequences further upstream 
were amplified from I MS genomic DNA using the following pair of 
P 53-specific primers: 5'UTR-l (sense, promoter region, ACCTAA- 
GCTTGTCATGGCGACTGTCCAGCTTTG) and p-EX (antisense. exon 
I . CCAATCCAGGGAAGCGTGTCACCG). 

Direct sequencing of doubte-stranded PCR products 
Double-suanded DNA fragments produced by PCR amplification were 
eluted from agarose gels and purified by extraction with phenol/ 
chloroform. 200 ng of purified PCR product were mixed with human 
p53-sperific oligonucleotides as sequencing primers, frozen m dry ice. 
dried in a centrifugal evaporator (Savant Sr^Vac). reo^Wcd in 
sequencing buffer (40 mM Tris-HCl pH 7.5. 25 mM MgQ> 50 mM 
NaQ. 10^ DMSO) and subjected to the sequencing reaction as desenbea 
by Winship(l9«9). 

RNase protection assay 

Plasmid p729 was constructed from three DNA fragments tn two stages, 
A 330 bp DNA fragment derived from the human P 53 gene prorncxer 
was excised from the P 2E-H2BX plasmid (Lamb and Crawfordl986 
with HMIW and Xbo\ and inserted into the pGEM-4 plasmid (ftomep) 
between the W/ndlll and Xbc\ sites. In the second stage, a ^meoi 
corresponding to the 5' end of p53 mRNA was obtained by KT-PW 
uiingp53 mRNA prepared from OCVAML-3 cells and the p5^£C«* 
5'UTR.3 (SnTexon I, CCGGAAGCTrCAAAA^^^^. 
GAGCCACCGTCCAG) and 5'AX4 (antisense, exon 4. <^^ C 9' 
AGCTGCTGCTGGTGC). The resulting fragment was end-filled vrth 
the Klenow fragment of DNA polymerase I. digested with Xbol at the 
site present in the 5'UTR-3 primer shown underlined and insetted 
between the XM and Smal sites present in the plasmid generated in me 

*p729 was linearized with rY/ndlli and a 729 nucleotide antisense probe 
was prepared by transcription with SP6 RNA polymerase The mxttm 
' ^Sio. Jetton mixture contained 50 mM Tris-HCl IpH WJ 0 >nM 
MgCl,. 4 mM spermidine. 10 mM NaCI, 0_5 mM each of ATP. OTP. 
CTP. 12 UTP. 5.|iCi 1 32 P1UTP. 10 mM dithiothreUoL 20 umu ; of 
RNAguard. 0.5 Mg of linearized template and 10 units of SP6 RNA 
polymerase in a final volume of 20 ul. After incubation at 37 Cfor 
60 min. the DNA tempUte was digested with DNase I ar^ l^A 
probe was extracted with phenol/chloroform, precipitated w^m eth^ot 
and resuspended in water. This RNA probe covered the entire p53 gene 
promoter region and included the first three exons and i > part^ 1 ^ 
founh exon. p53 transcripts initiating from one of the major sian « 
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should yield protected fronts of -385 nucleotides. P53 ^nscripu 
• r *hr m( Kj 5' of the minor start sites should yield 
J&fnS^^iito «T U c k and Crawford. .989, 

1X10* c p m. or .he labelled probe and precipitated with ethanol. The 
RNA/orot* mixtures were .hen washed, dried and resuspended in 10 Hi 
of hTd^n solution (Winter „ MS^. ItTC for 0 
min and hybridized a. AffC overnight. After hybnd.zal.on. the sample 
were mixed with 0.18 ml of RNase digestion mix containing 60 lig/ml 
of RNase A (type III. Sisma). 1100 U/ml of RNase Tl (Boehnnger 
SiSeTm, i. IS mM Nkl. 5 mM EDTA. .0 mM Tris-HC. pH 1.5. 
After incubation at 37*C for 60 min. the digestion was Hnwl by 
addition of 10 Ml of 20% SDS and 5 p.1 of proteinase K (10 mg/ml) 
(Boehrineer Mannheim) and incubation at 37'C for 15 mm. Protected 
fragments were extracted with phenol/chloroform, precipitated with 
ethanol. resolved by denaturing gel electrophoresis and visualized by 
autoradiography. 

Polysome analysis . 
5X 10 7 celU were washed once in ice-cold Tris-salin« solution (25 mM 
Tris-HCI pH 7.5. 25 mM NaCI) containing 10 mM MgCI : and 10 Hg/rnl 
cyclohetimide. The cells were then immediately lysed on «ce with the 
use of a Dounce homoeenizer in 2 ml homogenization buffer containing 
25 mM Tris-HCI P H IS. 25 mM NaCI. 10 mM MgCl> 2* Troon 
X-100. 340 U/ml heparin tLEO Laboratories Canada Md). - mM 
vanadyl ribonucleotide complex (Sigma). 2.5 mM PMSF. 10 Hg/ml 
cyctohexirmde. I mM dithiothreitol and I mM ECTA. The _ extract was 
centrifuoed at 14 000 r.p.m. for 6 min at 4'C to remove cell debns. the 
"pemaun. was collected and layered over a 15-50* linear sucrose 
gradient (II ml) prepared in homogenization buiTer The P*^"" 
centrifused in an SW41 Beckman rotor at 175 000 g for 110 nun at 
4"C Ten fractions of equal volume were collected from ihe^bottom or 
the tubes RNA was prepared from each of the fractions by phenoV 
chloroform extraction and ethanol P^f*f" r^?£V* 
■>00 ul DEPC-treated water. The amount of p53 mRNA in each fraction 
U00 ill of the RNA sample) was determined by dot-Wot hybod.zalion 
analysis usins a «P-labelled human p53 cDNA probe. Polysomes used 
, 0 calibrate (ne gradients w*re prepared in exactly the same way except 
for an additional purification step involving precipitation of the polysomes 
£e*nt in the nomc-enate with 100 mM MgC.; for I h on ice before 
sucrose eradient sedimentation. For calibration. 0.3-ral fractions were 
co^d'from the bottom of the gradient and A :M of each fraction was 
determined. 

<• 

Templates for in vitro transcription and translation 
Ptasmid P 2516 contains nearly full-length human ^JJJ 
and was constructed by the cocxect ligation of three cDNA fragments, 
^^^^pondlng to the 5' end of the p53 transcript was 
^ineTfom pR4-2 (Harlow et at. 1985) after digestion with Xbc\ 
t^U^^l**** I and 5. respectively. The middle fragment 
fiom pProSp53 (Matlashewski etaL J^£J— 
with Pxtdl and BamHl which cut in exons 5 and 11. respectively. The 
S ZLnt cocmponding to the 3< end of*e P 53 
obtainedV KT-PCK amplification of the 3'UTR of P^RN Aus ng 
D5Vspectfic otieonuckotides as pnmers, 3 SX13 (sense txon y ^ 
CTctcCCCATCCCACACCCTGG) and AS-4. The PCR-ampl.fied 
S-fillcd with the Klenow fragment of DNA Hymen* 
anTdi es£d at an internal ta.HI site. These three fragments which 
L™t contiguous sequences of the native p53 transenpt were inserted 
CSSl sites of a modified form of ^pS^vcctor 
which polylinkcr sequences ^2 
the XM site were deleted. The resulting plasmtd is referred to as 
^516 and yields a P 53 transcript in Wrm starting with the sequence 
S^AA^c)^GCIOAg& 3*. The in Wmi transenpt is nearly 

de^cTto^ ** r^AAACT^r 

transcription initiation sites in vivo wh.ch start with 5 CAAAAGICTA; 
OA V W and Crawford. 1989). The beginning of identity corres- 
^ I»XM site in the cDNA is underlined. Digestion o p2516 
Kf toRI provides a template that can produce a ^ynjtoic MMeag* 
o53 transcript of 2516 nucleotides. Digestion with BamHl provwles a 
£EE 'a ^truncated p53 transcript of 20M 15 

p2516 that contain, the Alu-like element present in the p53 3 UTR was 



inserted immediately downstream of the CAT gene. The plasmid pCAT- 
BS was constructed by removing the Sma\-B<imU\ fragment of the p53 
3'UTR present in p25l6 and inserting this fragment in reverse orientation 
into pSP6CAT immediately downstream of CAT. This Smal-BamHl 
fragment is missing the Alu-like clement present at the distal end of the 
P 53 3'UTR. 

in vitro transcription and in vitro potyadenyiation 
Plasmid DNAs containing templates for in \itrv transcription were 
linearized at selected restriction endonucleasc sites. Standard transcription 
assays (Melton c/ a!.. 1984) were performed as described above for the 
preparation of antisense RNA probes with the omission of [ 32 PJUTP. 
03 mM 7,n Gf5')ppp(5')G and 0.05 rnM GTP were included in the 
reactions to provide efficient capping at the 5* end of synthetic transcripts. 
Polvadenylation reactions contained synthetic RNA, 0.2 mM ATP, 50 mM 
Tris-HCI pH 8.0. 10 mM MgCli. 250 mM NaO, 2 mM MnCl 2 . 2 mM 
dithiothreitol. 1 unit/ul RNAguard (Pharmacia), 500 jig/ml of BSA 
(Pharmacia) and 5 units of poly(A) polymerase (Pharmacia) in a 50 fil 
final volume (McCrew et aL 1989). After 30 min at 37°C polyadenylaied 
RNAs were purified by phenol/chloroform extraction and ethanol pre- 
cipitation. 

In vitro translation and immunoprecipttation 
Synthetic transcripts were translated in miooccccal-nuc lease-treated 
rabbit reticulocyte lysates (Promega) under the conditions recommended 
bv the supplier. Reactions containing p53 transcripts were incubated for 
30 min at 30*C in the presence of I 35 Slmethionine and stopped by 
addition of dithiothreitol to a final concentration of I mM and EDTA 
pH 8.0 to a final concentration of 10 mM. Each reaction was then 
divided into two aliquots, one for immunoprecipitation with the p53- 
specific monoclonal antibody PAb42l and the other for immunoprecipK- 
ation with a control antibody PAW 19. Reactions containing CAT or 
luciferase transcripts were incubated for 30 min at 30 # C in the presence 
of | ?5 Slmethionine and were stopped by addition of protein sample buffer, 
boiled for 5 min and resolved by polyacrylamidc gel electrophoresis. 

Acknowledgements 

This work wxs supported by grants from the Medical Research Council 
of Canada and from the National Cancer Institute of Canada. 



References 

Banks.L.. Matlashewski.G. and CrawfordX. (1986) Isolation of human 
P 53-specific monoclonal antibodies and their use in the studies of 
human P 53 expression. Eur. J. Biochwu 159. 529-534- 

Benchimol^„ MunrocD.G- PeacocU., CrayJ). and Smith^J. (1989) ^ 
Abnormalities in structure and expression of the p53 gene m leukemia. 
Conctr Cells. 7. 121-125. ^ . 

Bienz,B.. Zakut-HouruR.. GivoUD. and OretuM. (1984) Arialysu t of the 
gene coding for the murine cellular tumour antigen p53. EM BO 7., X 

2179-2183. _ , _ . ^ w 

B*ienz-Tadmor3. Zakut-Houri^., Ubresco^., GtvolJX and OreiuM 
(1985) The 5' region of the P 53 gene: evolutionary «^^«!f 
evidence for a negative regulatory element. EMBOJ^ • u?i/ioiq\ 

ChirgwinJ.M.. Przybyla^.E. McDonald,RJ. and Ruttex,WJ. (1979^ 
Isolation of biochemically active ribonucleic acid from sources 
enriched in ribonuclease. Biochemistry. 18. 5294-5299. 

EvansT.C. Crittenden.S.U Kodoyianni.V and Kimble J. (1994) 
Translational control of maternal glp-l mRNA esublishes an 
asymmetry in the C etegans embryos. CelL 77, 183-194 

Feinbergj\.P. and VogelsteiruB. (1983) A technique for radio-labeling 
DNA restriction endonucleasc fragments to high specific activities. 
Anal Biochcm.. 132.6-13. 

Fenaux.R. JonveauxJ>. Quiquandon.I.. Ui J.U. PignonJ.M., Lou^ cu c x ; 
Lefebvre.M.H.. Bauters,F. Berger.R. and KerckaerU.P. 0991) p53 
gene mutations in acute myeloid leukemia with I7p monosomy. Blood. 
78. 1652-1657. 

FenauxP.. PreudhommcC QuiquandonX.. JonveauxJ'.. ^^ u ; 
VanrumbekcX.. Loucheux-Ufebvrc,M.H. BautersP., Berger,R. and 
KerckaerU.P. (1992) Mutations of the P 53 gene in acute myeloid 
leukaemia. J. Haematol.. 80. 178-183. . ^ 

Feng.S. and Holland.F.C. (1988) HlV-1 tat trans-act.vation requires the 
loop sequence within tat. Nature. 334. 165-167. ^ 

FialkowRJ SineerJ.W.. Raskind.W.H.. AdamsonJ.W., Jacobson^RJ. ; 
tSS^ Najfeld.V. and Veith.IL (1987) Clonal 

4399 



development, sternal! differentiation, and clinical remission in acute 
Alymphocytic leukemia. N Engl J. Med. 317, 468-473. 
Fritscfc-M.. Hacssler.C. and Brandner.G. (1993) Induction of nuclear 
accumulation of the tumor-suppressing protein p53 by DNA-damaging 
agents. Oncogene. 8. 307-318- 
FotTp Marty.L.. Piechaczyk.M.. Sabrouty.S.E.. Dant.C. Jeanteur.P. and 
Blanchard J M (1985) Various rat adult tissues express only one major 
mRNA species from the glyceraldehyde-3-phosphatCHdchydrogenase 
muliisenicfamilv.M/c/WrAW^^.. 13. 1431-1442. 
FuL Ye R BrowdcrX.W. and Johnston.R.N. (1991) Translational 
potentiation of messenger RNA with secondary structure in Xenopus. 
Science, 251. 807-810. . 
Gavis.E.R. and Lehrnann.R. (1994) Translational regulation of nanas by 

RNA localization. Nature. 369. 315-318. 
Goodwin E.B.. Okkema.P.G.. Evans.T.C. and Kimble J. (1993) 
Translational regulation of tra-2 by its .V LTTR controls sexual identity 
in C. elegans. Cell, 75. 329-339. 
Hartow.E. Crawford.L.V.. Pim.D.C and Williamson.N.M. (1981) 
Monoclonal antibodies specific for Simian virus 40 tumor antigen. 
/ Ww/.. 39. 861-869. 
Harlow.E.. Williamson.N.M.. Ralston.R.. Helfman.D.M. and Adams.TE. 
( 1985) Molecular clonins and in xitm expression of a DNA clone for 
human cellular tumor antigen P 53. Mot Cell BioL. 5. 1601-1610. 
Henos.H.. Hofferer.U Schneider.R. and Schweiger.M. (1990) cDNA 
encoding the human homologue of rat ribosomal protein L35a. Nucleic 

Acids Res.. 18. 4600. . 

Hsu H C. Tseng.H J.. Lai.P.L.. Lee.P.H. and Peng.S.Y. ( 1993) Expression 
of p53 gene in 184 unifocal hepatocellular carcinomas: association 
wilh tumor growth and invasiveness. Cancer Res.. 53. 4691-4694. 
HuarteJ . Stulz.A.. O'Connell.M.U. Guber.R. Belin.D., Darrow,A-U 
Strickland.S. and VassaltiJ.D. ( 1992) Transient translational silencing 
bv reversible mRNA deadenylation. Cell. 69. 1021-1030. 
Jackson R J (1993) Cvtoplasmic regulation of mRNA function: the 

importance of the 3'*UTR. Cell. 74. 9-14. 
Kastan.M.B'. et a!. < 199 la) Levels of P 53 protein increase with maturation 

in human hematopoietic cells. Cancer Res.. 51. 4279-1286. 
Kastan.M.B.. Onvekwere.O.. Sidransky.D.. Vogelstein.B. and Craig.R.W. 
(1991b) Participation of p53 protein in the cellular response to DNA 
damase. Cancer Res.. 51. 6304-631 1. ..-».,> 
KupiecJJ.. Giron.M.L.. Viletter.D.. JeitschJ.M. and ErnanoiI-Ravier.R- 
(1987) Isolation of high-molecular-weight DNA from eukaryotic cells 
bv formamide treatment and dialysis. Anal Biochem.. 164. 5>-59. 
Kwon YK. and HechcN.B. (1993) Binding of a phosphoprotein to the 
3' untranslated region of the mouse protamine 2 mRNA temporally 
represses its translation. Mot Celt BioL 13. 6547-6557. 
LaiJ U Preudhomme.C. ZandeckLM.. Ractif.M.. VanmmbekeX., 
Lepelley.R. WatteUE. and FenauxJV (1995) Myelodysplastic 
syndromes and acute myeloid leukemia with I7p deletion. An entity 
characterized by specific dysgranulopoiesis and a high incidence of 
P53 mutations. Leukemia. 9. 370-381. 
Lamb P. and Crawford.L. (1986) Characterization of the human p53 

gene. Mot Celt BioL. 6. 1379-1385. . . 

LuJC and Une.D.P. (1993) Differential induction of transcnpuonally 
active p53 following UV or ionizing radiation: Defects in chromosome 
instability syndromes? CelL 75. 765-778. 
Matlashewski.G.. UmbJ>. Pim.D., PeacockJ.. CrawfordX. and 
Benchtmol5. (1984) Isolation and characterization of a human p>3 
cDNA clone: expression of the human p53 gene. EMBO J.. 13. 
3'>57-326' > 

Matlashewski.G.. Banks.L.. Pim.D. and Cn.wfo«l.L.V. (1986) Analysis 

of human p53 proteins and mRNA kvels in normal and transformed 

cells. Eur. J. Biochtm.. 154. 665-672. 
Mailashewsfci.GJ .Tuclc.S- Pim.D.. Lamb.R. SchnoderJ. and Crawford 

L V (1987) Primary structure polymorphism at amino residue U ol 

human pS). Mot. CM Biol.. 7. 961-963. 
Melefors O. and Hentze.M.W. ( 1993)Transla(.onal regulation by mRNA- 

protein interactions in eukaryotic cells: ferritin and beyond. B.oEssays. 

I 5 85—90 

Melton.D.A.! Krieg.P.. Rebagliati.M.. Maniatis.T, Zinn.K. and Creen^L 
(1984) Efficient in vitro synthesis of biologically active RNA and 
RNA hybridization probes from plasmids containing a bacteriophage 
SP6 promoter. Nucleic Acids Res.. 12. 7035-7056. 

MilncrJ (1991) A conformation hypothesis for the suppressor and 
promoter functions of P 53 in cell growth control and in cancer. Pa*. 
R Sac. Land B.. 245. 139-145. 



Minden.M.D., Buick.R.N. and McCulloc^E.A. (1979) Separation of 

blast cell and T-lymphocytc progenitors- in the blood of patients with 

acute myeloblasts leukemia. Blood, 54. 186-195. 
Moll.U.M.. Riou.G. and LevincAJ. (1992) Two distinct mechanisms 

alter p53 in breast cancer: mutation and nuclear exclusion. Proc. Natl 

Acad. Set USA. 89. 7262-7266. 
Moll.U.M.. LaQuaglia.M.. BenardJ. and Riou.G. (1995) Wild-type 

p53 protein undergoes cytoplasmic sequestration in undifferentiated 

neuroblastomas but not in differentiated tumors. Proc. Natl Acad. Sci. 

USA. 92. 4407-4411. 
MomandJ.. Zambetit.G.R. OlsonJXC, Georgc.D. and Lev.tie.AJ. 

( 1992) The mdm-2 oncogene product forms a complex with the p53 

protein and inhibits p53-mediated transact* vation. CW/,69. 1237-1245. 
MosnerJ.. Mummenbrauer.T.. Bauer.C Sczakiel.G.. Grosse.F. and 

DepperuW. (1995) Negative feedback regulation of wild-type p53 

biosvnthesis. EMBO J., 12. 4739-4746. 
OlinerJ.D.. Kinzler.K.W., Mcltzer.RS.. Georgc.D and Vogelstein,B. 

(1992) Amplification of a gene encoding a p53-associated protein in 

human sarcomas. Nature, 358. 80-83. 
Papayannopoulou.T.. Nakomoto.B.. Kurachi.S., TweeddalcM. and 

Messner.H. (1988) Surface antigenic profile and globin phenotype of 

two new human erythrolcukemia lines: characterization and 

interpretations. Blood. 72. 1029-1038. 
Pausc-A.. Methot.N. and Sonenberg,N. (1993) The HR1GRXXR region 

of the DEAD box RNA helicase eukaryotic translation initiation factor 

4A is required for RNA binding and ATP hydrolysis. Mot Cell. BivL 

13. 6789-6798. 

RogeLA.. PoplikerJvl., Webb.CG. and OrctuM. (1985) p53 cellular 

tumor antigen: analysis of mRNA levels in normal adult tissues. 

embryos, and tumors. Mot Celt BioL 5, 2851-2855. 
Sasano.H.. Gookon.Y. Nishihia.X and NaguraJL (1992) In situ 

hybridization and immunohistochemisuy of p53 tumor suppressor 

gene in human esophageal carcinoma. Am. / Pathol. 141. 545-550. 
Scheffner.M.. Wemess.B.A« HuibregtseXM., Levine,AJ. and Howley. 

P.M. (1990) The E6 oncoprotein encoded by human papillomavirus 

types 16 and 1 8 prortKjtes the degradation of p53.Ce//. 63, 1129-1136. 
Slin2eriandJ.M., Minden.M.D. and Bcnchimol.S. (1991) Mutations of 

the p53 gene in human acute myelogenous leukemia. Blood. 7. 

1500-1507. . 
Smith.LJ.. McCullocKE.A. and BenchimoLS. (1986) Expression of the 

p53 oncogene in acute myeloblasts leukemia. /. Exp. Med., 164. 

751-761. 

Su^imoto.K. Toyoshima.H., SakaUU Miyagawa,lC Hagiwara,IC 
Hirat,H.. Ishikana.F. and TakakuJ. (1991) Mutations of the p53 gene 
in lymphoid leukemia. Blood. 77, 1153-1156. 
Sugimoto.lC Hirano.N., Toyoshima.H., Chiba^., Mano,H.. TalcakuJ. 
Yazaki.Y. and HiraUH. (1993) Muutions of the p53 gene in 
myelodysplastic syndromes and MDS-derived leukemia. Blood. 81. 
3022-3026. _ . , 

TorczynskiJtM., Fuke.M. and Boiloa^P. (1985) Oorung and 
sequencing of a human 18S ribosomal RNA gene. DNA, 4, 283-291. 
TreccaJ)^ Longo.U BicmdUA., CroJU CaloriR., GrignaniJ^ 
Maiolo.A.T., PeliccuP.G. and NeriA. (1994) Analysis of p53j 
muutions in acute myeloid leukemia. Am. J. Hematol. 46, 304-30V. 
Tuck.S.P. and Crawford,U (1989) Characterization of the human p53 

gene promoter. Mot Cell Biol.. 9. 2163-2172. 
Ucda.H. Ullrich.SJ.. GangemU.D.. KappelCA. Ngo,L, Feitelson. 
M.A. and Jay.G. (1995) Functional inactivation but not structural 
mutation of p53 causes liver cancer. Nature Genet. 9 % 41-47. 
Ullrich.SJ.. Mcrcer.W.E. and Appella,E (1992) Human wild-type p53 
adopts a unique conformational and phosphorylation state « vnv 
during growth arrest of glioblastoma cells. Oncogene. 7, l635 " 1 ^* 
Wang.C, CurtisJ.E^ Minden.M.D. and McCulloch,EA. (19891 
Expression of a retinoic acid receptor gene in myeloid leukemia cells. 
Leukemia. 3, 264-269. 
WatteLE. Preudhomme.C. Hecquet,B. t VanrumbekcM., Quesnei.»-j 
DerviteJ. Morel.P. and Fenaux.P. (1994) P 53 mutations are associateu 
with resistance to chemotherapy and short survival in hematologic 
malignancies. Blood. 84, 3 148-3 1 57. 
Wiggans,R.G.. Jacobson.RJ.. Fialkow.PJ. Woolley.P.Y.. Macdooald^J- 
and SchciruRS. (1978) Probable clonal origin of acute myeloblasuc 
leukemia following radiation and chemotherapy of colon cance . 
Blood. 52. 650-663. . m ^ 

Winship.RR. (1989) An improved method for directly sequencing £ 
amplified material using dimethyl sulphoxide. Nucleic Acids iv » 
17. 1266. 



4400 



v „„.„P Almoeuera.C. and Penjcho.M. (1985) A method 
™™<^™*™^™^ t mutations in inscribed genes: 
,o detect and *^ e ™£™ of th4 muunt C -Ki-ras allele in 

- ' r™TZ ^SVSZSS USA. 82. 7575-7579. 
human tumor «''^^T' AJ Jf ,,993, Indoctioo of cellular P 53 

JVw ' EstcyE. HestcrJ. and Deisseroth.A. (1992) Altered 

^SSf"c/C> ptotein in myeloid «J 7 «« 

mUoeen-^imulated nonnal blood cells. Oncost. 7 1645-1647. 
7h rYM BradburyD and RussellX (1993) Express™ of d.fferent 

is related lo in vitro growth characteristics. Br. J. Canctr. M. 
851-855. 



www.nature.coni/onc 



Degradation of the E7 human papillomavirus oncoprotein by the ubiquitin- 
proteasome system: targeting via ubiquitination of the N-terminal residue 

Eyal Reinstein 1 , Martin Scheffner, Moshe Oren 3 , Aaron Ciechanover* 1 and Alan Schwartz -5 

J Department of Biochemistry and the Rappaport Family Institute for Research in the Medical Sciences, The Bruce Rappapon 
Faculty of Medicine, Technion- Israel Institute of Technology , Haifa 31096, Israel; 2 Ins t it ut fur Biochemie, Medizinische Fakultcu; 
Universitdt zu Koeln, 50931 Koeln, Germany; - Department of Molecular Cell Biology, The Weizmann Institute of Science, Reho\,>; 
76100, Israel; 4 Departments of Pediatrics and of Molecular Biology and Pharmacology . Washington University School of Medic iih 
and St. Louis Children's Hospital, St. Louis. Missouri, MO 63110-1093, USA 



The E7 oncoprotein of the high risk human papilloma- 
virus type 16 (HPV-16), which is etiologically associated 
with uterine cervical cancer, is a potent immortalizing 
and transforming agent. It probably exerts its oncogenic 
functions by interacting and altering the normal activity 
of cell cycle control proteins such as p2r VAKI , p27 K,pl and 
pRb, transcriptional activators such as TBP and AP-1, 
and metabolic regulators such as M2-pyruvate kinase 
(M2-PK). Here we show that E7 is a short-lived protein 
and its degradation both in vitro and in vivo is mediated 
by the ubiquitin-proteasome pathway. Interestingly, 
ubiquitin does not attach to any of the two internal 
Lysine residues of E7. Substitution of these residues with 
Arg does not affect the ability of the protein to be 
conjugated and degraded; in contrast, addition of a Myc 
tag to the N-terminal but not to the C-terminal residue, 
stabilizes the protein. Also, deletion of the first 11 amino 
acid residues stabilizes the protein in cells. Taken 
together, these findings strongly suggest that, like MyoD 
and the Epstein Barr Virus (EBV) transforming Latent 
Membrane Protein 1 (LMP1), the first ubiquitin moiety 
is attached linearly to the free N-terminal residue of E7. 
Additional ubiquitin moieties are then attached to an 
internal Lys residue of the previously conjugated 
molecule. The involvement of E7 in many diverse and 
apparently unrelated processes requires tight regulation 
of its function and cellular level, which is controlled in 
this case by ubiquitin-mediated proteolysis. Oncogene 
(2000) 19, 5944-5950, 

Keywords: human papilloma-virus (HPV); E7; ubiqui- 
tin; proteolysis; N-terminus 



Introduction 

The E7 oncoprotein of the high risk human papillo- 
mavirus type 16, which is etiologically associated with 
pathogenesis of human uterine cervical cancer, is a 
potent immortalizing and transforming protein. Ex- 
pression of E7 can transform rodent fibroblasts (Kanda 
el aL, 1988), and in conjunction with an activated Ras 
oncogene, primary rodent cells (Phelps et aL, 1988). 
Continued expression of the E7 gene is required for the 
maintenance of the transformed phenotype (Crook et 
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aL, 1989), and expression of the protein in non- 
metastatic mouse cell lines renders the cells metastatic 
in nude mice (Chen et aL. 1993). In transgenic mice, 
co-expression of E7 along with E6, another high ri>k 
HPV oncoprotein, elicits epidermal hyperplasia (Auc- 
warakul et aL. 1994), verrucose lesions and papilloma> . 
(Greenhalgh et ai, 1994). Furthermore, E6 and E7 can 
cooperate to induce various tumors when expressed 
ectopically in transgenic mice (Arbeit et aL, 1993: Pan 
and Griep, 1994). Finally, both E7 and E6 arc 
necessary and sufficient to immortalize their priman 
host cells, human squamous epithelial, cells (Ha\vle\- 
Nelson et aL, 1989; Munger et aL, 1989). 

While the molecular mechanisms that underlie the 
transforming and immortalizing activity of E7 are still 
obscure, the protein appears to exert most of ii> 
oncogenic functions by interacting physically with ke\ 
cellular regulatory proteins which leads to modulation 
of their normal activity. One main function of E7 is its 
ability to deregulate control of cell cycle , progression, 
allowing cells to exit GO and enter S phase. It has been 
shown that via its cd2 domain, E7 binds to the cell 
cycle regulators pl07, pi 30 and pRb (Arroyo et aL. 
1993; Davies et aL, 1993; Hu et aL, 1995). Normally, 
these proteins function as transcriptional repressors 
that lead to Gl arrest. It was suggested that the 
binding of E7 to these proteins leads to their 
dissociation from their complex with E2F which 
correlates with stimulation of E2F-dependent transcrip- 
tion. It has also been shown that E7 interacts with 
both p27 K,pl (Zerfass-Thome et aL, 1996) and p21 WA,; 
(Funk et aL, 1997). Consequently, both proteins fail to 
block the activity of Cyclin E/Cdk2 complexes which 
allow transition of the cell across the Gl/S border. 
Binding of E7 to the Jun component of AP-1 can lead 
to activation of AP-1 driven genes (Antinore et ai- 
1996). It has been also shown that E7 binds to M2- 
pyruvate kinase (M2-PK), lowers its affinity to 
phosphoenol-pyruvate, and thus slows influx of 
substrates into the tricarboxylic, citric acid cycle 
(Zweschke et aL, 1999). This leads to accumulation 
of upstream phosphometabolites which serve as 
precursors to amino acids and nucleotides. The pool 
of these precursors is low in resting cells, but its 
expansion is necessary during rapid cell division. E7 
can act however in a different mechanism; similar to 
targeting of p53 for degradation by HPV E6 (Scheffner 
et aL, 1990), it has been shown that association of E7 
with pRb also targets the repressor for ubiquitin- 
mediated degradation (Boyer et aL, 1996). Targeting of 
pRb, and potentially of other regulatory proteins, for 
degradation, may serve as a second mechanism, besides 



I physical interaction, by which E7 exerts its dereg- 

• uiatorv effects. In the case of pRb, removal of the 
protein induces the activity of the E2F family of 
cellular transcription factors which are known to 
control the expression of the major cell cycle regulatory 

\ 2 enes at the Gl/S transition. 

r It has been reported that E7 is short-lived (Selvey et 

• ai, 1994), however the system involved in its 
j degradation and the mechanism(s) that underlie the 

• process have remained obscure. Many studies have 
( implicated the ubiquitin pathway in the degradation of 

• various short-lived key regulatory proteins. It is 
' involved in proteolysis and processing of many cellular 
I proteins, including cell cycle regulators, oncoproteins 
' and tumor suppressors, transcriptional activators. ER 
\ membrane proteins and cell surface receptors. In most 

cases, ubiquitination of the target protein signals its 
! degradation by the 26S proteasome. Degradation of a 

protein via the ubiquitin-proteasome pathway involves 

two discrete and successive steps: (i) covalent attach- 
' ment of multiple ubiquitin molecules to the protein 
[ substrate; and (ii) degradation of the tagged protein by 

the 26S proteasome. Conjugation of ubiquitin involves 
I activation and transfer of ubiquitin from the ubiquitin- 
i activating enzyme, El, to one of several ubiquitin- 
1 carrier proteins, E2s (known also as ubiquitin-con- 
j jugating enzymes, UBCs). E2 transfers the activated 

ubiquitin moiety to the target substrate that is 
| specifically bound to a member of the ubiquitin-protein 

ligase family, E3. Subsequent processive transfer of 

additional activated ubiquitin molecules and their 

conjugation to previously attached moieties, generates 
- a poiyubiquitin chain that serves as a degradation 
1 signal for the proteasome. Binding of the substrate to 
| E3 plays an essential role in specific substrate 
■ recognition (for recent reviews on the ubiquitin- 
[ proteasome pathway, see, for example, Kornitzer and 
/ Ciechanover, 2000; Voges et al., 1999). In most cases. 
I the first ubiquitin molecule is transferred to an s-NH 2 
| iiroup of an internal Lysine residue of the target 

protein. However, targeting of the myogenic transcrip- 
( lion factor MyoD (Breitschopf et al, 1998) and of the 
| Epstein Barr Virus (EBV) Latent Membrane Protein- 1 
1 fLMP-1; Aviel et al, 2000) involves initial ubiqiiitina- 
! lion of the N-terminal residue followed by synthesis of 

a poly-ubiquitin chain attached to an internal Lys 
1 residue of this N-terminally attached ubiquitin moiety. 
I Thus, unlike many known substrates of the ubiquitin 

system, degradation of these proteins does not require 
1 any internal Lys residue. It has also been reported that 

a mutant lysine-less a chain of the T cell receptor 

(TCR) is also degraded by the proteasome. in a process 
t that depends on an intact ubiquitin system. However, a 

role for direct ubiquitination of the substrate, as well 

• as identification of potential ubiquitination sites, has 
I not been discerned (Yu and Kopito, 1999). Similarly, 

ubiquitin-mediated endocytosis and degradation of the 
I growth hormone receptor also proceeds in the absence 
: of any lysine residue (Govers et al, 1999). The inability 
' to identify ubiquitin adducts of the two receptors lead 
j lo the hypothesis that ubiquitination of another, yet to 
; be identified factor, plays a role in the endocytic 
J process. 

Discovery of additional substrates is essential in 
order to establish N-terminal ubiquitination as a novel 
targeting pathway, to analyze the structural motifs 
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involved, to identify the conjugating enzymes, and in 
particular the ubiquitin ligase, E3. and to study the 
physiological significance of this new pathway. 

Here we show that HPV- 16 E7 is a novel substrate 
of the ubiquitin pathway that is targeted for degrada- 
tion via N-terminal ubiquitination. 



Results 

Degradation of E7 in a cell free reconstituted system 
requires ATP, formation of a poiyubiquitin chain and 
the ubiquitin-carrier protein E2-F1 

To study the mechanisms that underlie the degradation 
of E7, we reconstituted a cell free proteolytic system. 
As can be seen in Figure la, degradation of the protein 
requires three components, ATP, ubiquitin. and the 
ubiquitin carrier protein (E2) E2-F1 (E2-F1 is the 
rabbit homolog of the human UbcH7). Omission of 
any one of these components from the reaction 
mixture, abolished degradation. To further study the 
mechanism of ubiquitin action, we investigated whether 
formation of a poiyubiquitin chain is required to 
promote degradation. To that end, we used the 
methylated derivative of ubiquitin that can modify 
the target protein only once and serves as a chain 
terminator (Hershko and Heller, 1985). As can be seen 
in Figure lb, MeUb strongly inhibited degradation of 
. E7 in the cell free system. This inhibition can be 
alleviated by the addition of excess free WT ubiquitin. 
To demonstrate directly generation of a substrate 
anchored poiyubiquitin chain, we incubated labeled 
E7 in crude HeLa cell extract in the absence or 
presence of ATPyS. This nucleotide can support the 
activity of the ubiquitin activating enzyme El (in which 
the ol-P bond is utilized), but not the activity of the 26S 
proteasome that requires cleavage of the /?-;• bond 
(Johnston and Cohen, 1991). As can be seen in the 
experiment depicted in Figure lc, incubation of labeled 
E7 in the presence of ATPyS generates a poiyubiquitin 
chain that is anchored to the substrate. 

Degradation of E7 in cells is mediated by the proteasome 

To study the mechanism(s) that underlie degradation of 
E7 in vivo, we followed the stability of the protein in cells 
in the absence or presence of the specific proteasome 
inhibitor lactacystin. As can be seen in Figure 2. E7 is a 
short lived protein. Measurements in different experi- 
ments have demonstrated that the half life of the protein 
is - 30-40 min (see also Figure 7). Here, after 1 h, more 
than 70% of. the protein is degraded. Addition of 
lactacystin inhibited degradation completely. 

Degradation of a lysine-less E7 in a cell free reconstituted 
system requires A TP , formation of a poiyubiquitin chain 
and the ubiquitin-carrier protein E2-F1 

We have previously shown that degradation of the 
transcriptional activator MyoD requires attachment of 
the first ubiquitin moiety to the N-terminal free amino 
acid residue and not to any internal Lys residue of the 
protein (Breitschopf et aL 1998). Similarly, ubiquitin- 
mediated degradation of the Latent Membrane Protein 
1 (LMP1) of the Epstein-Barr Virus is not dependent 
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Figure 1 E7 is conjugated to ubiquitin and degraded in vitro in 
an ATP-, E2-F1-, and ubiquitin-dependent manner, (a) Degrada- 
tion of E7 requires ATP. ubiquitin and E2-F1. Degradation of in 
vitro translated and ^-methionine labeled E7.was monitored in a 
cell free reconstituted system that contained reticulocyte Fraction 
II as described under Materials and methods. Ubiquitin, ATP, 
and E2-F1 were added as indicated. To avoid contamination of 
the labeled substrate with E2-FK it was fractionated over DEAE 
prior to its addition to the reaction mixture as described under 
Materials and methods, (b) Degradation of E7 requires ubiquitin 
and formation of a polyubiquitin chain. Degradation of in vitro 
translated and labeled E7 was monitored in a cell free 
reconstituted system that contained HeLa cell Fraction II and 
ATP as described under Materials and methods. Ubiquitin, and 
MeUb were added as indicated, (c) Conjugation of ubiquitin to 
E7. In vitro translated and labeled E7 was incubated in complete 
HeLa cell extract in the absence or presence of ATPyS as 
indicated and as described under Materials and methods 



on the single internal Lys residue of the molecule, and 
also requires initial fusion of ubiquitin to the N- 
terminal residue (Aviel et aL, 2000). To study whether 
a similar mechanism is also involved in the degradation 
of E7, we replaced the two Lys residues in positions 60 
and 97 with Arg. As can be seen in the experiment 
depicted in Figure 3, similar to the WT protein, 
degradation of the lysine-less E7 in a cell free 
reconstituted system also requires ubiquitin, E2-F1, 
and ATP (Figure 3a). Degradation requires formation 
of a polyubiquitin chain (Figure 3b,c). As noted for 
MyoD and LMPl, the amount of ubiquitin adducts 
formed is lower in the case of the lysine-less mutant 
(compare c in Figures 1 and 3), suggesting that in the 
WT protein, internal Lys residues can also play a role, 
though not an essential one, in the proteolytic process. 



Figure 2 Degradation of E7 in cells is sensitive to proteasonic 
inhibition. Cos 7 cells were transiently transfected with cDNA 
coding for E7. Degradation of the protein was monitored in ;t 
pulse-chase labeling and immunoprecipitation experiment in the 
absence or presence of the cell permeable proteasome inhibitor 
(7rt.y/o-lactacystin ^-lactone as described under Materials and 
methods 



Degradation of lysine-less E7 in cells depends on an active 
proteasome 

Similar to the WT protein, degradation of the lysine- 
less mutant in cells is also mediated by the proteasome 
(Figure 4). 

Blocking of the N- but not of the C- terminus of E7 inhibits 
both conjugation and degradation of E7 both in vitro and 
in vivo 

To study the involvement of the N-terminal domain in 
targeting the protein for N-terminal residue ubiquitina- 
tion, we fused a 6 x Myc tag to the N-terminal and the C- 
terminal residues of E7. We predicted . that if the N- 
terminal domain is involved in specific recognition of the 
protein, and ubiquitin will be fused only to a certain 
amino acid sequence, moving of this sequence down- 
stream from the N-terminal domain will inhibit both 
conjugation and degradation. Indeed, as can be seen in 
Figure 5, conjugation of an N-terminally Myc-tagged 
WT E7 (that contains the two internal Lys residues) is 
strongly inhibited compared to a similar protein that 
contains a Myc tag fused to its C-terminal residue 
(Figure 5a). Similarly, while the degradation of a C- 
terminally Myc-tagged E7 in a cell free system proceeds 
efficiently, the degradation of an N-terminal Myc-tagged 
WT protein is strongly inhibited (Figure 5b). Not 
surprisingly, the degradation of a similar Lysine-less 
mutant is also inhibited (Figure 5b). We noted that the 
C-terminally tagged mutant migrates slower than its N- 
terminally tagged counterparts; the reason for this 
peculiar behavior is not known. Similarly, in cells, N- 
terminally tagged WT and lysine-less E7s are stable, 
unlike the C-terminally tagged WT protein (Figure 6). 

The N-terminal domain of E7 contains the 
ubiquitination signal 

To study directly the role of the N-terminal domain of 
E7 as a ubiquitination signal, we deleted the first 11 or 
7 amino acids of the protein. As can be seen in Figure 
7a, deletion of the first 11 residues stabilizes the 
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Figure 3 Lysine-Iess E7 is conjugated to ubiquitin and degraded 
in°vitro in an ATP-, E2-FK and ubiquitin-dependent manner, (a) 
Degradation of lysine-less E7 requires ATP, ubiquitin and E2-F1. 
Degradation of in vitro translated and 3:> S-methionine labeled 
lysine-less E7 was monitored in a cell free reconstituted system 
that contained reticulocyte Fraction II as described under 
Materials and methods. Ubiquitin. ATP. and E2-F1 were added 
as indicated. To avoid contamination of the labeled substrate with 
E2-F1. it was fractionated over DEAE prior to its addition to the 
reaction mixture as described under Materials and methods, (b) 
Degradation of lysine-less E7 requires ubiquitin and formation of 
a polyubiquitin chain. Degradation of in vitro translated and 
labeled lysine-less E7 was monitored in a cell free reconstituted 
system that contained HeLa cell Fraction II and ATP as described 
under Materials and methods. Ubiquitin. and MeUb were added 
as indicated, (c) Conjugation of ubiquitin to lysine-less E7. In 
vitro translated and labeled lysine-less E7 was incubated in 
complete HeLa cell extract in the absence or presence of ATPyS 
as indicated and as described under Materials and methods 
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Figure 4 Degradation of lysine-less E7 in cells is sensitive to 
proieasome inhibition. Cos 7 cells were transiently transfected 
with cDNA coding for lysine-less E7. Degradation of the protein 
^.as monitored in a pulse-chase labeling and immunoprecipitation 
experiment in the absence or presence of the cell permeable 
proieasome inhibitor c/as/o-lactacystin ^-lactone as described 
under Materials and methods 
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Figure 5 Conjugation (a) and Degradation (b) of N- and C- 
terminally Myc-tagged WT and lysine-less E7 in vitro, (a) ATPyS- 
dependent conjugation of N- and C-terminally Myc-tagged WT 
E7. N-terminally (N-MT) and C-terminally (C-MT) Myc-tagged 
in vitro translated and 15 S-methionine-labeled WT-E7 were 
subjected to ATPyS-driven conjugation in a complete HeLa cell 
extract that contains also ubiquitin. Reactions were carried out 
and conjugates resolved via SDS-PAGE as described under 
Materials and methods, (b) Degradation of N- and C-terminally 
Myc-taseed WT, and N-terminallv Mvc-tagged lysine-less E7 in a 
cell freV system. WT [WT(N)]'and lysine-less [LL (N)] N- 
terminaliy Myc-tagged and WT [WT (C)] C-terminally Myc- 
tagged E7 were subjected to degradation in a cell free system 
containing complete HeLa cell extract, ubiquitin. and ATP. 
Reactions were carried out and proteins resolved via SDS-PAGE 
as described under Materials and methods 



protein. In contrast, degradation of a mutant protein 
from which the first 7 amino acid residues were deleted, 
proceeded similarly to that of the WT protein (Figure 
7b). The finding that the signal is constituted of a 
relatively long segment raises the hypothesis that it 
serves not only as an anchor for specific ubiquitination. 
but also as a binding site for the ubiquitin ligase E3. 



Discussion 

Ubiquitin-mediated degradation of multiple key reg- 
ulatory proteins is involved in the regulation of many 
basic cellular processes. Here we show that the HPV E7 
oncoprotein is a substrate for the ubiquitin system. It is 
degraded in an ATP-dependent manner' in a process 
that requires also ubiquitin and the ubiquitin-carrier 
protein E2-Fl/UbcH7 (Figure la). It is not clear 
whether this E2 is the only carrier protein involved, 
or whether other E2s, such as members of the UbcH5 
family that are involved in the degradation of the bulk 
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Figure 6 Degradation of N-terminally Myc-tagged WT and 
lysine-less and C-terminally Myc-tagged WT E7 in cells. Cos 1 
cells were transiently transfected with cDNAs coding for WT 
[WT-MT(N)] and lysine-less [LL-MT(N)] N-terminally Myc- 
tagged and C-terminally [WT-MT(C)] Myc-tagged E7. Stability 
of"the proteins was monitored in a pulse-chase labeling and 
immunoprecipitation experiment as described under Materials 
and methods 



of cellular proteins, are also required. The identity of 
the ubiquitin-protein ligase E3 that binds the substrate 
specifically is still obscure. Degradation of E7 requires 
formation of a polyubiquitin chain: the chain termi- 
nator methylated ubiquitin inhibits degradation of E7. 
and inhibition can be relieved by the addition of excess 
free WT ubiquitin (Figure lb). Indeed, we were able to 
demonstrate directly formation of polyubiquitinated E7 
following incubation of the protein in the presence of 
ATPyS (Figure lc). This nucleotide promotes- forma- 
tion of conjugates, but inhibits their degradation. In 
cells, E7 is extremely unstable and has a t li2 of - 30- 
40 min. Incubation of cells in the presence of 
lactacystin, a specific proteasome inhibitor, inhibited 
degradation almost completely (Figure 2). 

To further dissect the mechanisms that underlie the 
recognition and degradation of E7, it was important to 
study whether a specific Lys residue is essential for 
formation of the polyubiquitin chain, or whether the 
two Lys residues in positions 60 and 97 can equally 
serve as anchors to the polyubiquitin chain. We 
individually replaced each of the two Lys residues 
with Arg, however, we could not observe any effect on 
either ubiquitin-mediated conjugation and degradation 
in vitro, or on the stability of the protein in vivo (not 
shown). Therefore, we decided to replace these two 
residues. However, when incubated in a cell free 
reconstituted system, and similar to the behavior of 
the WT protein, degradation of the lysine-less E7 was 
still dependent on ATP, E2-F1, and ubiquitin (Figure 

3) . Also, in cells, degradation of the lysine-less protein 
was sensitive to inhibition of the proteasome (Figure 

4) . As no lysine residues were available for ubiquitina- 
tion, we suspected that the first ubiquitin residue is 
attached to the free N-terminal NH 2 group, as is the 
case for MyoD (Breitschopf et aL, 1998) and EBV 
LMP1 (Aviel et al., 2000). To study the possibility that 
E7 is also targeted via initial N-terminal ubiquitina- 
tion, and with the assumption that the N-terminal 
domain of the protein determines the specificity for 
this process, we altered the N terminal domain of both 
WT and the lysine-less E7 by fusing it with a Myc-tag. 
For both forms, tagging resulted in major stabilization 
of the protein in vitro (Figure 5b) and in vivo (Figure 
6). Concomitantly, we noted a marked decrease in 
conjugation of the tagged WT protein in a cell free 
system (Figure 5a). This decrease occurred despite the 
presence of the two internal Lys residues in the 
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Figure 7 Degradation of 1 1 (a) and 7 (b) amino acid residues N- 
terminally deleted E7s in cells, (a) Degradation of 1 i amino acid 
residues N-terminally deleted WT E7. Cos 7 cells were transiently 
transfected with a cDNA that codes for E7 lacking the first 11 
amino acid residues (residues 2-12; AN11). Stability of the WT 
and the deletion mutant were monitored in a pulse-chase labeling 
and immunoprecipitation experiment as described under Materi- 
als and methods. Band intensities were quantified by analysis of 
the imaging data and plotted as relative percentage of the signal 
at time 0. (b) Degradation of 7 amino acid residues N-terminally 
deleted WT E7. Cos 7 cells were transiently transfected with a 
cDNA that codes for E7 lacking the first 7 amino acid residues 
(residues 2-8; AN7). Stability of the WT and the deletion mutant 
and quantitative analysis of the data were carried out as described 
under Materials and methods and above 



protein. The effects of Myc tagging on both conjuga- 
tion and degradation are even more striking if one 
takes into consideration the existence of additional six 
lysine residues in the tag. The finding that the Myc- 
tagged WT E7 can still be conjugated is probably due 
to the existence of the two internal lysine residue. 
Interestingly, while after 60 min the protein appears 
stable (Figure 6), it is still degraded in a much slower 
rate in a ubiquitin dependent mode (not shown). This 
finding demonstrates that the internal lysine residues 
(either of the protein and/or of the tag) can play a role 
in the process, though not a major one. In the native 
protein, they may serve as modulators of proteolysis. 
A similar observation was noted also for MyoD and 
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LMP1. The Myc tag can affect the stability of a 
protein that is targeted for degradation following 
ubiquitination of the N-terminal residue by blocking 
the access of ubiquitin, the ligase, or both, to a specific 
ubiquitination site and/or recognition motif at the N- 
terminal domain. To test this notion directly, we 
deleted the first 7 and 11 N-terminal amino acid 
residues. As can be seen in Figure 7, deletion of the 
first 11 residues stabilized the protein significantly. In 
contrast, deletion of the first seven residues was not 
sufficient to confer stability. Our initial results indicate 
that deletion of amino acid residues 8-11 results in 
stabilization of the protein, suggesting that they may 
play an important role in the recognition and targeting 
of the protein (unpublished data). Obviously, it will be 
important to study whether this sequence can serve as 
a 'universal' transferable N-terminal destabilizing 
element. It should be noted, however, that we could 
not find sequence homology between the N-terminal 
regions of E7, MyoD, LMP1, and TCRa (it is not 
clear that this protein is degraded via N-terminal 
ubiquitination; Yu and Kopito 1999), but such 
functional motifs or epitopes may be generated 
following folding of the protein rather than at the 
primary^sequence level. It is quite possible that the 
motifs are different and targeted by different ligases. 
Future identification of the E3 will be necessary to 
resolve the question of whether the N-terminal region 
serves also as a recognition site for the ligase. In this 
context it is worth mentioning the case of the Cdk 
inhibitor p21 Cipl , where the researchers suggested that, 
similar to ODC, the protein is targeted by the 
proteasome in a process that does not involve 
ubiquitination (Shear! et a/., 2000); a lysine-less N- 
terminally Myc-tagged protein is still unstable and 
degraded in a proteasome-dependent manner. 

It should be emphasized that N-terminal ubiquitina- 
tion is different from the recognition via the N-end rule 
(Varshavsky, 1996) where the protein is recognized via 
the N-terminal residue, but conjugation occurs on 
internal lysines. 

It is clear that for N-terminal ubiquitination to 
occur, a protein must have a free and exposed N- 
terminal residue. Thus, proteins acetylated at the N- 
terminal residue cannot be targeted via this pathway. 
While 80% of cellular proteins are modified at the N- 
terminal residue by acetylation (Jornvall, 1975), 20% 
bear free and exposed N-termini. It is possible that 
these proteins are targeted via N-terminal ubiquitina- 
tion. Indeed, in agreement with the set of 'rules' that 
determines, according to the three N-terminal residues, 
whether a protein will be acetylated, E7 that has 
MHG as its first three residues, should not be 
modified. However, since the set of 'rules' was 
determined for an extremely limited number of 
proteins, mostly yeast proteins (Polevoda et al, 
1999; Boissel et al, 1988; Huang et al, 1987), it is 
not clear whether it is possible to predict which 
proteins will be subjected to N-terminal ubiquitina- 
tion. The hypothesis however can be tested experi- 
mentally. Furthermore, manipulation of proteins by 
altering their N-terminal domain and rendering them 
susceptible or resistant to acetylation, can also 
corroborate or rule out the notion that N-terminal 
ubiquitination is the pathway of 'choice' for free and 
exposed N-termini proteins. 
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Materials 



Materials for SDS-PAGE and Bradford reagent were from 
Bio-Rad. A mixture of L- 35 S-labeled methionine and cysteine 
for metabolic labeling, [ 35 S]methionine for in vitro translation, 
as well as pre-stained MW markers and immobilized Protein 
G, were obtained from Amersham Pharmacia Biotech. Tissue 
culture sera and media were from Biological Industries (Bet 
Hae'emek, Israel) or from Sigma. Antibody against E7 was 
from Santa Cruz, ctorolactacystin /Mactone was from 
Calbiochem. Ubiquitin, dithiothreitol (DTT), adenosine-5'- 
triphosphate (ATP), phosphocreatirie, creatine phosphoki- 
nase, 2-deoxyglucose and [Tris(hydroxymethyl)amino- 
methane] (Tris buffer), were from Sigma. Hexokinase and 
Fugene™ 6 transfection reagent were from Roche Molecular 
Biochemicals. Wheat germ extract-based transcription-trans- 
lation coupled kit (TNT 8 ) was from Promega. Restriction 
and modifying enzymes were from New England Biolabs. 
Oligonucleotides were synthesized by Biotechnology General, 
(Rehovot, Israel). All other reagents were of high analytical 
grade. 

Methods 

Cell lines Cos-7 cells grown at 37°C in Dulbecco's Modified 
Eagle's Medium supplemented with 10% fetal calf serum 
(FCS). All transfections were carried out using the Fugene™ 
reagent, and cells were analysed after 36-48 h. 

Plasmids and construction of mutant cDNAs WT and lysine- 
less mutant E7 cDNAs were subcloned into the £coRI- 
Xbal site of the pCS2 and pCS2 + MT vectors (Breitschopf 
et al, 1998; Aviel et al, 2000). These vectors were used for 
both in vitro translation (under the control of SP6 RNA 
polymerase) and expression in mammalian cells. Point 
mutations in E7 were generated by site-directed mutagenesis 
using the QuickChange™ kit (Stratagene). Deletion of the 
first 7 (AN7) or 11 (AN11) N-terminal amino acid residues 
of E7 was carried out using PCR and specific primers. PCR 
products were digested with EcoRl and Xbal and ligated 
into the pCS2 vector. In-frame insertion of 6xMyc tag in 
the N or C termini of E7 was carried out using the 
pCS + MT vector and the appropriate PCR primers. 
Sequences of all constructs were confirmed using an 
automatic sequencing system (ABI 310). 

Preparation and fractionation of crude reticulocyte lysate Re- 
ticulocytes were induced in rabbits and lysates were prepared 
as described (Hershko et al, 1983). The lysate was 
fractionated over DEAE cellulose onto unadsorbed material 
(Fraction 1) and high salt eluate (Fraction II) as described 
(Hershko et al, 1983). E2-F1 was prepared from Fraction I 
as described (Blumenfeld et al, 1994). HeLa cell extract was 
prepared by hypotonic lysis as described previously (Orian et 
al, 2000) and fractionated as described above. 

Conjugation and degradation of E7 in a cell free reconstituted 
system The E7 cDNAs were translated in the presence of 
[ 5s S]Methionine using wheat germ coupled transcription- 
translation extract (TNT - , Promega) and SP6 RNA 
polymerase. When indicated, the crude lysate that contains 
the labeled substrate was fractionated over DEAE resin to 
Fraction I and II as described above. The labeled substrate 
was contained in Fraction II. Conjugation and degradation 
assays in a cell-free systems were performed 'as described 
elsewhere (Breitschopf et al, 1998; Aviel et al, 2000). Briefly, 
reaction mixture contained in a final volume of 12.5 /il : 50 ^g 
whole HeLa cell lysate proteins, or 50 reticulocyte 
Fraction II and 1 ftg E2-F1 as indicated, 5 /ig ubiquitin. 
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and -25 000 CPM of in vitro translated labeled El. 
Reactions were performed in the presence of 0.5 mM ATP 
„ and .an ATP-regenerating system (10 mM phosphocreatine 
*~ and *5 /ic phosphocreatine kinase), or ATPyS (5 mM) as 
indicated. For depletion of ATP, 0.5 /<g hexokinase and 
20 mM deoxyglucose were added. When indicated, the chain 
terminator methylated ubiquitin (MeUb; Hershko and Heller, 
1985) was added at 5.0 pg. In these reactions ubiquitin was 
present at 1.0 /ig. To overcome the inhibition of MeUb, 
15 //g of ubiquitin were added. Conjugation assays contained 
in addition 0.5 /ig of the isopeptidase inhibitor ubiquitin 
aldehyde (UbAI; ^Hershko and Rose 1987). Degradation 
reactions were carried out at 37 5 C for 2 h, whereas 
conjugation assays were incubated at 37"C for 1 h. Reactions 
were terminated by the addition of sample buffer and. 
resolved by SDS-PAGE (15%). E7 was visualized by 
Phosphorlmager (Fuji. Japan) 

Stability of proteins in vivo Cellular stability of E7 proteins 
was monitored in a pulse-chase labeling and immunopreci- 
pitation experiments as described (Breitschopf et aL 1998). 
The proteasome inhibitor ffas/o-lactacystin /Mactone (10 /<m) 
was added 20 rnin prior to the end of the labeling period 
(pulse) and was present throughout the experiment. Follow- 
ing labeling, cells were harvested (time 0: pulse) or were 
further incubated for the indicated periods of time (chase). 

References 

Antinore MJ, Birrer MJ, Patel D, Nader L and McCance DJ. 

(1996). EM BO J., 15, 1950-1960. 
Arbeit JM, Munaer K, Howley PM and Hanahan D. (1993). 

Am. J. Pathol. 142, 1187- 1197. 
Arroyo M. Bagchi S and Raychaudhuri P. (1993). Mol. Cell 

Biol., 13, 6537-6546. 
Auewarakul P, Gissmann L and Cidarregui A. (1994). Mol 

Cell. Biol., 14, 8250-8258. 
Aviel S, Winbera G, Masucci M and Ciechanover A. (2000). 

J. Biol Chem., 275, 23491-23499. 
Blumenfeld N, Gonen H, Mayer A, Smith CE, Siegel NR, 

Schwartz AL and Ciechanover A. (1994). J. Biol. Chem., 

269,9574-9581. 
Boissel J-P, Kasper TJ and Bunn FH. (1988). J. Biol. Chem., 

263, 8443-8449. 
Boyer SN. Wazer DE and Band V. (1996). Cancer Res., 56, 

4620-4624. 

Bradford MM. (1976). Anal. Biochem., 72, 248-254. 
Breitschopf K, Bengal E, Ziv T, Admon A and Ciechanover 

A. (1998). EMBO J., 17, 5964-5973. 
Chen LP, Ashe S, Singhal MC, Galloway DA, Hellstrom I 

and Hellstrom ICE. (1993). Proc. Natl. Acad. Sci USA ,90, 

6523-6527. 

Crook T, Morgenstern JP, Crawford L and Banks L. (1989). 

EMBO J.. 8, 513-519. . 
Davies R. Hicks R, Crook T, Morris J and Vousden K. 

(1993). J. Virol.. 67, 2521-2528. 
Funk J, Waga S, Harry J, Espling E, Stillman B and 

Galloway D. (1997). Genes Dev., 11, 2090-2100. 
Govers R, ten-Broeke T, van-Kerkhof P, Schwartz AL and 

Strous GJ. (1999). EMBO J., 18, 28-36. 
Greenhalgh DA, Wang XJ, Rothnagel JA, Eckhardt JN, 

Quintanilla MI, Barber JL, Bundman DS, Longley MA, 

Schlegel R and Roop DR. (1994). Cell Growth Differ., 5, 

667-675. 

Hawley-Nelson P, Vousden KH, Hubbert NL, Lowy DR and 

Schiller JT. (1989). EMBO J., 8, 3905-3910. 
Hershko A, Heller H, Elias S and Ciechanover A. (1983), J. 

Biol. Chem.. 258, 8206-8214. 
Hershko H and Heller H. (1985). Biochem. Biophys. Res. 

Commun., 128, 1079- 1086. 
Hershko A and Rose I A. (1987). Proc. Natl. Acad. Sci. USA, 

84, 1829-1833. 



Cells were lysed, and the labeled proteins were precipiiaicd 
using anti-E7 antibody. Immune complexes were collected 
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Protein concentration Protein concentration was determined 
according to Bradford (1976) using BSA as a standard. 
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